最常见的SQL反模式是什么?

All of us who work with relational databases have learned (or are learning) that SQL is different. Eliciting the desired results, and doing so efficiently, involves a tedious process partly characterized by learning unfamiliar paradigms, and finding out that some of our most familiar programming patterns don't work here. What are the common antipatterns you've seen (or yourself committed)?

我们所有使用关系数据库的人都了解(或正在了解)SQL是不同的。获得所需的结果，并如此高效地进行，涉及到一个冗长的过程，其特征部分是学习不熟悉的范例，并发现我们最熟悉的一些编程模式在这里不起作用。您看到的(或您自己提交的)常见反模式是什么?

40 个解决方案

#1

139

I am consistently disappointed by most programmers' tendency to mix their UI-logic in the data access layer:

我一直对大多数程序员在数据访问层中混合ui逻辑的倾向感到失望:

SELECT
    FirstName + ' ' + LastName as "Full Name",
    case UserRole
        when 2 then "Admin"
        when 1 then "Moderator"
        else "User"
    end as "User's Role",
    case SignedIn
        when 0 then "Logged in"
        else "Logged out"
    end as "User signed in?",
    Convert(varchar(100), LastSignOn, 101) as "Last Sign On",
    DateDiff('d', LastSignOn, getDate()) as "Days since last sign on",
    AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' +
        City + ', ' + State + ' ' + Zip as "Address",
    'XXX-XX-' + Substring(
        Convert(varchar(9), SSN), 6, 4) as "Social Security #"
FROM Users

Normally, programmers do this because they intend to bind their dataset directly to a grid, and its just convenient to have SQL Server format server-side than format on the client.

通常，程序员这样做是因为他们打算将数据集直接绑定到网格，而且在客户机上使用SQL Server格式比使用format更方便。

Queries like the one shown above are extremely brittle because they tightly couple the data layer to the UI layer. On top of that, this style of programming thoroughly prevents stored procedures from being reusable.

像上面所示的查询非常脆弱，因为它们将数据层与UI层紧密地结合在一起。除此之外，这种编程风格完全阻止了存储过程的可重用性。

#2

103

Here are my top 3.

这是我的前三名。

Number 1. Failure to specify a field list. (Edit: to prevent confusion: this is a production code rule. It doesn't apply to one-off analysis scripts - unless I'm the author.)

1号。未能指定字段列表。(编辑:防止混淆:这是一个生产代码规则。它不适用于一次性的分析脚本——除非我是作者。

SELECT *
Insert Into blah SELECT *

should be

应该是

SELECT fieldlist
Insert Into blah (fieldlist) SELECT fieldlist

Number 2. Using a cursor and while loop, when a while loop with a loop variable will do.

2号。使用游标和while循环，当使用循环变量的while循环时。

DECLARE @LoopVar int

SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable)
WHILE @LoopVar is not null
BEGIN
  -- Do Stuff with current value of @LoopVar
  ...
  --Ok, done, now get the next value
  SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable
    WHERE @LoopVar < TheKey)
END

Number 3. DateLogic through string types.

3号。DateLogic通过字符串类型。

--Trim the time
Convert(Convert(theDate, varchar(10), 121), datetime)

Should be

应该是

--Trim the time
DateAdd(dd, DateDiff(dd, 0, theDate), 0)

I've seen a recent spike of "One query is better than two, amiright?"

我最近看到了“一个查询比两个查询好，amiright?”

SELECT *
FROM blah
WHERE (blah.Name = @name OR @name is null)
  AND (blah.Purpose = @Purpose OR @Purpose is null)

This query requires two or three different execution plans depending on the values of the parameters. Only one execution plan is generated and stuck into the cache for this sql text. That plan will be used regardless of the value of the parameters. This results in intermittent poor performance. It is much better to write two queries (one query per intended execution plan).

这个查询需要两个或三个不同的执行计划，具体取决于参数的值。只生成一个执行计划并将其插入到此sql文本的缓存中。无论参数的值如何，都将使用该计划。这导致了间歇性的性能下降。最好编写两个查询(每个预期的执行计划对应一个查询)。

#3

Human readable password fields, egad. Self explanatory.

人类可读的密码字段，egad。自我解释。
Using LIKE against indexed columns, and I'm almost tempted to just say LIKE in general.

用类似于索引列的方法，我几乎想用一般的方式说。
Recycling SQL-generated PK values.

回收SQL-generated PK值。
Surprise nobody mentioned the god-table yet. Nothing says "organic" like 100 columns of bit flags, large strings and integers.

令人惊讶的是，到目前为止还没有人提及上帝桌。没有什么比100列位标志、大字符串和整数更“有机”的了。
Then there's the "I miss .ini files" pattern: storing CSVs, pipe delimited strings or other parse required data in large text fields.

然后是“I miss .ini文件”模式:在大型文本字段中存储csv、管道分隔字符串或其他解析所需的数据。
And for MS SQL server the use of cursors at all. There's a better way to do any given cursor task.

而对于MS SQL server，则完全使用游标。有一种更好的方法来执行任何给定的游标任务。

Edited because there's so many!

编辑因为有这么多!

#4

Don't have to dig deep for it: Not using prepared statements.

不要深入挖掘:不要使用预先准备好的语句。

#5

Using meaningless table aliases:

使用没有意义的表别名:

from employee t1,
department t2,
job t3,
...

Makes reading a large SQL statement so much harder than it needs to be

使读取大型SQL语句变得比需要的困难得多。

#6

var query = "select COUNT(*) from Users where UserName = '" 
            + tbUser.Text 
            + "' and Password = '" 
            + tbPassword.Text +"'";

Blindly trusting user input
盲目相信用户输入
Not using parameterized queries
不使用参数化查询
Cleartext passwords
明文密码

#7

My bugbears are the 450 column Access tables that have been put together by the 8 year old son of the Managing Director's best friends dog groomer and the dodgy lookup table that only exists because somebody doesn't know how to normalise a datastructure properly.

我的bugbears是由总经理的好友狗狗groomer 8岁的儿子和一个不知道如何正确地规范数据结构的不可靠的查找表组合而成的450个列访问表。

Typically, this lookup table looks like this:

通常，这个查找表是这样的:

ID INT,
Name NVARCHAR(132),
IntValue1 INT,
IntValue2 INT,
CharValue1 NVARCHAR(255),
CharValue2 NVARCHAR(255),
Date1 DATETIME,
Date2 DATETIME

I've lost count of the number of clients I've seen who have systems that rely on abominations like this.

我已经记不清我见过有多少客户使用这种系统了。

#8

The ones that I dislike the most are

我最不喜欢的是

Using spaces when creating tables, sprocs etc. I'm fine with CamelCase or under_scores and singular or plurals and UPPERCASE or lowercase but having to refer to a table or column [with spaces], especially if [ it is oddly spaced] (yes, I've run into this) really irritates me.

在创建表、sprocs等时使用空格。我可以使用CamelCase或under_score、单数或复数、大写或小写，但必须引用一个表或列(带有空格)，特别是如果(它的间距很奇怪)(是的，我遇到过这个)真的让我很恼火。
Denormalized data. A table doesn't have to be perfectly normalized, but when I run into a table of employees that has information about their current evaluation score or their primary anything, it tells me that I will probably need to make a separate table at some point and then try to keep them synced. I will normalize the data first and then if I see a place where denormalization helps, I'll consider it.

规范化的数据。表不需要完全规范化,但当我遇到一个表的员工信息他们当前的评估分数或他们的主要任何东西,它告诉我,我可能会需要一个单独的表在某种程度上,然后试着保持同步。我将首先对数据进行规范化，然后如果我发现非规范化有帮助，我将考虑它。
Overuse of either views or cursors. Views have a purpose, but when each table is wrapped in a view it's too much. I've had to use cursors a few times, but generally you can use other mechanisms for this.

过度使用视图或游标。视图是有目的的，但是当每个表被包装在一个视图中时，它就太多了。我曾经使用过一些游标，但是通常你可以使用其他的机制。
Access. Can a program be an anti-pattern? We have SQL Server at my work, but a number of people use access due to it's availabilty, "ease of use" and "friendliness" to non-technical users. There is too much here to go into, but if you've been in a similar environment, you know.

访问。程序可以是反模式吗?在我的工作中我们有SQL Server，但是由于它对非技术用户的可用性、“易用性”和“友好性”，许多人使用access。这里有太多东西要讲，但是如果你曾在类似的环境中，你知道。

#9

Overuse of temporary tables and cursors.

过度使用临时表和游标。

#10

use SP as the prefix of the store procedure name because it will first search in the System procedures location rather than the custom ones.

使用SP作为存储过程名称的前缀，因为它将首先在系统过程位置而不是自定义过程位置中搜索。

#11

For storing time values, only UTC timezone should be used. Local time should not be used.

对于存储时间值，应该只使用UTC时区。不应使用当地时间。

#12

Re-using a 'dead' field for something it wasn't intended for (e.g. storing user data in a 'Fax' field) - very tempting as a quick fix though!

将一个“死”字段重新用于它不打算用于的东西(例如，将用户数据存储在一个“传真”字段中)——这是一个非常诱人的快速修复!

#13

select some_column, ...
from some_table
group by some_column

and assuming that the result will be sorted by some_column. I've seen this a bit with Sybase where the assumption holds (for now).

假设结果是由some_column排序的。我在Sybase中看到过这一点(目前)。

#14

using @@IDENTITY instead of SCOPE_IDENTITY()

使用@@IDENTITY而不是SCOPE_IDENTITY()

Quoted from this answer :

从这个答案中引用:

@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes. You need to be careful here, since it's across scopes. You could get a value from a trigger, instead of your current statement.
@@IDENTITY返回当前会话中跨所有范围为任何表生成的最后一个标识值。这里需要小心，因为它是跨范围的。您可以从触发器获得一个值，而不是您当前的语句。
SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope. Generally what you want to use.
SCOPE_IDENTITY返回为当前会话和当前范围中的任何表生成的最后一个标识值。一般来说，你想用什么。
IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope. This lets you specify which table you want the value from, in case the two above aren't quite what you need (very rare). You could use this if you want to get the current IDENTITY value for a table that you have not inserted a record into.
IDENT_CURRENT返回为任何会话和任何范围中的特定表生成的最后一个标识值。这允许您指定要从哪个表中获取值，以防上述两个表不是您所需要的(非常罕见)。如果您想获取尚未插入记录的表的当前标识值，可以使用此方法。

#15

SELECT FirstName + ' ' + LastName as "Full Name", case UserRole when 2 then "Admin" when 1 then "Moderator" else "User" end as "User's Role", case SignedIn when 0 then "Logged in" else "Logged out" end as "User signed in?", Convert(varchar(100), LastSignOn, 101) as "Last Sign On", DateDiff('d', LastSignOn, getDate()) as "Days since last sign on", AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' + City + ', ' + State + ' ' + Zip as "Address", 'XXX-XX-' + Substring(Convert(varchar(9), SSN), 6, 4) as "Social Security #" FROM Users

Or, cramming everything into one line.

或者，把所有东西都塞进一行。

#16

The FROM TableA, TableB WHERE syntax for JOINS rather than FROM TableA INNER JOIN TableB ON

从表a，表b中，连接的语法而不是表a内部连接表
Making assumptions that a query will be returned sorted a certain way without putting an ORDER BY clause in, just because that was the way it showed up during testing in the query tool.

假设查询将以某种方式返回，而不需要将ORDER BY子句放入，因为这是在查询工具中测试时显示的方式。

#17

I need to put my own current favorite here, just to make the list complete. My favorite antipattern is not testing your queries.

我需要把我目前最喜欢的放在这里，只是为了使列表完整。我最喜欢的反模式不是测试查询。

This applies when:

这适用于:

Your query involves more than one table.
您的查询涉及多个表。
You think you have an optimal design for a query, but don't bother to test your assumptions.
你认为你有一个最优的查询设计，但是不要费心去测试你的假设。
You accept the first query that works, with no clue about whether it's even close to optimized.
您接受第一个有效的查询，不知道它是否接近优化。

And any tests run against atypical or insufficient data don't count. If it's a stored procedure, put the test statement into a comment and save it, with the results. Otherwise, put it into a comment in the code with the results.

任何针对非典型或不充分的数据运行的测试都不算数。如果它是一个存储过程，那么将测试语句放入注释中，并将其与结果一起保存。否则，将其放入带有结果的代码中的注释中。

#18

Learning SQL in the first six months of their career and never learning anything else for the next 10 years. In particular not learning or effectively using windowing/analytical SQL features. In particular the use of over() and partition by.

在他们的职业生涯的前六个月里学习SQL，在接下来的10年里什么都不学。特别是没有学习或有效地使用窗口/分析SQL特性。特别是使用over()和分区。

Window functions, like aggregate functions, perform an aggregation on a defined set (a group) of rows, but rather than returning one value per group, window functions can return multiple values for each group.

窗口函数与聚合函数一样，在已定义的行集(一组)上执行聚合，但是窗口函数不会为每个组返回一个值，而是为每个组返回多个值。

See O'Reilly SQL Cookbook Appendix A for a nice overview of windowing functions.

请参阅O'Reilly SQL Cookbook附录A，了解窗口函数的良好概述。

#19

Contrarian view: over-obsession with normalization.

逆向思维:过度沉迷于规范化。

Most SQL/RBDBs systems give one lots of features (transactions, replication) that are quite useful, even with unnormalized data. Disk space is cheap, and sometimes it can be simpler (easier code, faster development time) to manipulate / filter / search fetched data, than it is to write up 1NF schema, and deal with all the hassles therein (complex joins, nasty subselects, etc).

大多数SQL/RBDBs系统都提供了许多非常有用的特性(事务、复制)，即使是对非规范化数据也是如此。磁盘空间很便宜，有时可以更简单(更简单的代码，更快的开发时间)来操作/过滤/搜索获取的数据，而不是编写1NF模式，并处理其中的所有麻烦(复杂的连接，讨厌的子选择等等)。

I have found the over-normalized systems are often premature optimization, especially during early development stages.

我发现过规范化的系统通常是不成熟的优化，特别是在开发的早期阶段。

(more thoughts on it... http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/)

(更多的想法…http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/)

#20

Temporary Table abuse.

临时表滥用。

Specifically this sort of thing:

特别是这类事情:

SELECT personid, firstname, lastname, age
INTO #tmpPeople
FROM People
WHERE lastname like 's%'

DELETE FROM #tmpPeople
WHERE firstname = 'John'

DELETE FROM #tmpPeople
WHERE firstname = 'Jon'

DELETE FROM #tmpPeople
WHERE age > 35

UPDATE People
SET firstname = 'Fred'
WHERE personid IN (SELECT personid from #tmpPeople)

Don't build a temporary table from a query, only to delete the rows you don't need.

不要从查询中构建临时表，只删除不需要的行。

And yes, I have seen pages of code in this form in production DBs.

是的，我在生产DBs中看到过这种形式的代码页。

#21

I just put this one together, based on some of the SQL responses here on SO.

我把这个放在一起，基于这里的一些SQL响应。

It is a serious antipattern to think that triggers are to databases as event handlers are to OOP. There's this perception that just any old logic can be put into triggers, to be fired off when a transaction (event) happens on a table.

将触发器视为数据库，就像事件处理程序是OOP一样，这是一种严重的反模式。有一种观点认为，任何旧的逻辑都可以放到触发器中，当事务(事件)发生在表上时触发触发器。

Not true. One of the big differences are that triggers are synchronous - with a vengeance, because they are synchronous on a set operation, not on a row operation. On the OOP side, exactly the opposite - events are an efficient way to implement asynchronous transactions.

不正确的。其中一个最大的区别是触发器是同步的——具有复仇性，因为它们在集合操作上是同步的，而不是在行操作上。在OOP方面，完全相反的事件是实现异步事务的有效方法。

#22

1) I don't know it's an "official" anti-pattern, but I dislike and try to avoid string literals as magic values in a database column.

1)我不知道这是一个“正式的”反模式，但我不喜欢并试图避免将字符串常量作为数据库列中的神奇值。

An example from MediaWiki's table 'image':

MediaWiki表格“image”中的一个例子:

img_media_type ENUM("UNKNOWN", "BITMAP", "DRAWING", "AUDIO", "VIDEO", 
    "MULTIMEDIA", "OFFICE", "TEXT", "EXECUTABLE", "ARCHIVE") default NULL,
img_major_mime ENUM("unknown", "application", "audio", "image", "text", 
    "video", "message", "model", "multipart") NOT NULL default "unknown",

(I just notice different casing, another thing to avoid)

(我只是注意到不同的套管，另一件要避免的事)

I design such cases as int lookups into tables ImageMediaType and ImageMajorMime with int primary keys.

我设计了这样的例子，将int查找到表ImageMediaType和ImageMajorMime和int主键。

2) date/string conversion that relies on specific NLS settings

2)依赖于特定NLS设置的日期/字符串转换

CONVERT(NVARCHAR, GETDATE())

without format identifier

没有格式标识符

#23

Identical subqueries in a query.

查询中相同的子查询。

#24

The Altered View - A view that is altered too often and without notice or reason. The change will either be noticed at the most inappropriate time or worse be wrong and never noticed. Maybe your application will break because someone thought of a better name for that column. As a rule views should extend the usefulness of base tables while maintaining a contract with consumers. Fix problems but don't add features or worse change behavior, for that create a new view. To mitigate do not share views with other projects and, use CTEs when platforms allow. If your shop has a DBA you probably can't change views but all your views will be outdated and or useless in that case.

被改变的视图——经常被改变的视图，没有注意或理由。这种变化要么会在最不恰当的时候被注意到，要么更糟的是会出错，而且永远不会被注意到。也许您的应用程序会因为有人为该列想出了一个更好的名称而中断。作为一种规则，视图应该在与使用者保持契约的同时扩展基表的用处。修复问题，但不要添加特性或更糟糕的更改行为，因为这会创建一个新视图。为了避免与其他项目共享视图，在平台允许时使用CTEs。如果您的商店有DBA，那么您可能无法更改视图，但是在这种情况下，您的所有视图都将过时或无用。
The !Paramed - Can a query have more than one purpose? Probably but the next person who reads it won't know until deep meditation. Even if you don't need them right now chances are you will, even if it's "just" to debug. Adding parameters lowers maintenance time and keep things DRY. If you have a where clause you should have parameters.

一个查询有多个目的吗?可能，但是下一个读到它的人直到深思熟虑后才会知道。即使你现在不需要它们，你也有可能，即使它只是“只是”调试。增加参数可以降低维护时间，保持干燥。如果你有where子句，你应该有参数。

The case for no CASE -

没有理由-

SELECT  
CASE @problem  
  WHEN 'Need to replace column A with this medium to large collection of strings hanging out in my code.'  
    THEN 'Create a table for lookup and add to your from clause.'  
  WHEN 'Scrubbing values in the result set based on some business rules.'  
    THEN 'Fix the data in the database'  
  WHEN 'Formating dates or numbers.'   
    THEN 'Apply formating in the presentation layer.'  
  WHEN 'Createing a cross tab'  
    THEN 'Good, but in reporting you should probably be using cross tab, matrix or pivot templates'   
ELSE 'You probably found another case for no CASE but now I have to edit my code instead of enriching the data...' END

#25

Stored Procedures or Functions without any comments...

存储过程或函数没有任何注释…

#26

Putting stuff in temporary tables, especially people who switch from SQL Server to Oracle have a habit of overusing temporary tables. Just use nested select statements.

将内容放到临时表中，特别是从SQL Server切换到Oracle的人，有过度使用临时表的习惯。只需使用嵌套的select语句。

#27

The two I find the most, and can have a significant cost in terms of performance are:

我发现的两种，在性能方面有很大的成本的是:

Using cursors instead of a set based expression. I guess this one occurs frequently when the programmer is thinking procedurely.

使用游标而不是基于集合的表达式。我想当程序员在思考过程时经常会出现这种情况。
Using correlated sub-queries, when a join to a derived table can do the job.

当连接到派生表时，可以使用相关子查询。

#28

Developers who write queries without having a good idea about what makes SQL applications (both individual queries and multi-user systems) fast or slow. This includes ignorance about:

编写查询的开发人员不知道什么使SQL应用程序(单个查询和多用户系统)快速或缓慢。这包括无知:

physical I/O minimization strategies, given that most queries' bottleneck is I/O not CPU
物理的I/O最小化策略，因为大多数查询的瓶颈是I/O而不是CPU
perf impact of different kinds of physical storage access (e.g. lots of sequential I/O will be faster than lots of small random I/O, although less so if your physical storage is an SSD!)
不同类型物理存储访问的perf影响(例如，大量的顺序I/O将比许多小的随机I/O要快，尽管如果您的物理存储是SSD，则会更少)。
how to hand-tune a query if the DBMS produces a poor query plan
如果DBMS产生一个糟糕的查询计划，如何手动调优查询
how to diagnose poor database performance, how to "debug" a slow query, and how to read a query plan (or EXPLAIN, depending on your DBMS of choice)
如何诊断糟糕的数据库性能，如何“调试”慢速查询，以及如何读取查询计划(或解释，取决于您选择的DBMS)
locking strategies to optimize throughput and avoid deadlocks in multi-user applications
锁定策略，以优化吞吐量和避免多用户应用程序中的死锁
importance of batching and other tricks to handle processing of data sets
批处理和其他处理数据集处理技巧的重要性
table and index design to best balance space and performance (e.g. covering indexes, keeping indexes small where possible, reducing data types to minimum size needed, etc.)
表和索引设计以最佳地平衡空间和性能(例如覆盖索引、尽可能地保持索引小、将数据类型减少到所需的最小大小等)。

#29

Using SQL as a glorified ISAM (Indexed Sequential Access Method) package. In particular, nesting cursors instead of combining SQL statements into a single, albeit larger, statement. This also counts as 'abuse of the optimizer' since in fact there isn't much the optimizer can do. This can be combined with non-prepared statements for maximum inefficiency:

使用SQL作为美化的ISAM(索引顺序访问方法)包。特别是，将游标嵌套，而不是将SQL语句组合到单个语句中(尽管更大)。这也被认为是“对优化器的滥用”，因为实际上并没有多少优化器能够做到这一点。这可以与无准备的报表结合，以最大限度地降低效率:

DECLARE c1 CURSOR FOR SELECT Col1, Col2, Col3 FROM Table1

FOREACH c1 INTO a.col1, a.col2, a.col3
    DECLARE c2 CURSOR FOR
        SELECT Item1, Item2, Item3
            FROM Table2
            WHERE Table2.Item1 = a.col2
    FOREACH c2 INTO b.item1, b.item2, b.item3
        ...process data from records a and b...
    END FOREACH
END FOREACH

The correct solution (almost always) is to combine the two SELECT statements into one:

正确的解决方案(几乎总是)是将两个SELECT语句合并为一个:

DECLARE c1 CURSOR FOR
    SELECT Col1, Col2, Col3, Item1, Item2, Item3
        FROM Table1, Table2
        WHERE Table2.Item1 = Table1.Col2
        -- ORDER BY Table1.Col1, Table2.Item1

FOREACH c1 INTO a.col1, a.col2, a.col3, b.item1, b.item2, b.item3
    ...process data from records a and b...
END FOREACH

The only advantage to the double loop version is that you can easily spot the breaks between values in Table1 because the inner loop ends. This can be a factor in control-break reports.

双循环版本的唯一优点是，您可以很容易地发现表1中的值之间的断点，因为内循环结束。这可能是控制中断报告的一个因素。

Also, sorting in the application is usually a no-no.

此外，应用程序中的排序通常是不允许的。

#30

Using primary keys as a surrogate for record addresses and using foreign keys as a surrogate for pointers embedded in records.

使用主键作为记录地址的代理，使用外键作为嵌入记录中的指针的代理。

#1

139