优化在Oracle上运行缓慢的SELECT查询，该查询在SQL服务器上运行迅速

I'm trying to run the following SQL statement in Oracle, and it takes ages to run:

我试着在Oracle中运行下面的SQL语句，需要很长时间才能运行:

SELECT orderID FROM tasks WHERE orderID NOT IN 
(SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL)

If I run just the sub-part that is in the IN clause, that runs very quickly in Oracle, i.e.

如果我只运行in子句中的子部分，它在Oracle中运行得非常快。

SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL

Why does the whole statement take such a long time in Oracle? In SQL Server the whole statement runs quickly.

为什么整个声明在Oracle中花了这么长时间?在SQL Server中，整个语句运行得很快。

Alternatively is there a simpler/different/better SQL statement I should use?

或者，我是否应该使用一个更简单/不同/更好的SQL语句?

Some more details about the problem:

关于这个问题的更多细节:

Each order is made of many tasks
每个订单由许多任务组成
Each order will be allocated (one or more of its task will have engineer1 and engineer2 set) or the order can be unallocated (all its task have null values for the engineer fields)
每个订单都将被分配(其任务的一个或多个任务将设置engineer1和engineer2)或该订单可以被取消分配(其所有任务都具有engineer字段的空值)
I am trying to find all the orderIDs that are unallocated.
我正在寻找所有未分配的orderid。

Just in case it makes any difference, there are ~120k rows in the table, and 3 tasks per order, so ~40k different orders.

为了防止有任何不同，表中有~120k行，每个顺序有3个任务，所以~40k个不同的顺序。

Responses to answers:

回答的答案:

I would prefer a SQL statement that works in both SQL Server and Oracle.
我更喜欢在SQL Server和Oracle中都可以使用的SQL语句。
The tasks only has an index on the orderID and taskID.
任务只有orderID和taskID上的索引。
I tried the NOT EXISTS version of the statement but it ran for over 3 minutes before I cancelled it. Perhaps need a JOIN version of the statement?
我尝试了不存在的版本的声明，但它运行了3分钟以上，我取消了它。可能需要声明的连接版本?
There is an "orders" table as well with the orderID column. But I was trying to simplify the question by not including it in the original SQL statement.
orderID列还有一个“orders”表。但我试图简化问题，不将它包含在原始SQL语句中。

I guess that in the original SQL statement the sub-query is run every time for each row in the first part of the SQL statement - even though it is static and should only need to be run once?

我猜想，在最初的SQL语句中，在SQL语句的第一部分中，每一行每次都运行子查询——即使它是静态的，应该只运行一次?

Executing

执行

ANALYZE TABLE tasks COMPUTE STATISTICS;

made my original SQL statement execute much faster.

使原来的SQL语句执行得更快。

Although I'm still curious why I have to do this, and if/when I would need to run it again?

虽然我仍然很好奇为什么我要这么做，如果我需要再次运行它?

The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.

这些统计数据提供了Oracle基于成本的optimzer信息，用于确定不同执行计划的效率:例如，一个表中的行数、行的平均宽度、每列的最高和最低值、每列不同值的数量、索引的聚类因子等。

In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.

在一个小的数据库中，你可以设置一份工作，每天晚上收集统计数据，并让它独立。事实上，这是10g下的默认值。对于较大的实现，您通常需要权衡执行计划的稳定性与数据更改的方式，这是一个棘手的平衡。

Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.

Oracle还拥有一个称为“动态抽样”的特性，用于在执行时对表进行抽样，以确定相关统计信息。它更常用于数据仓库，在数据仓库中，采样的开销超过了长时间运行的查询的潜在性能提升。

18 个解决方案

#1

Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)

如果分析所涉及的表，这种类型的问题通常就会消失(因此Oracle对数据的分布有更好的理解)

ANALYZE TABLE tasks COMPUTE STATISTICS;

#2

The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":

“IN”子句在Oracle中是非常慢的。事实上，Oracle中的内部查询优化器不能很好地处理“In”语句。试着用“存在”:

SELECT orderID FROM tasks WHERE orderID NOT EXISTS 
    (SELECT DISTINCT orderID FROM tasks WHERE
         engineer1 IS NOT NULL AND engineer2 IS NOT NULL)`print("code sample");`

Caution: Please check if the query builds the same data results.

注意:请检查查询是否构建了相同的数据结果。

Edith says: ooops, the query is not well formed, but the general idea is correct. Oracle has to fulfill a full table scan for the second (inner) query, build the results and then compare them to the first (outer) query, that's why it's slowing down. Try

伊迪丝说:哎呀，查询的格式不太好，但是总的想法是正确的。Oracle必须对第二个(内部)查询执行完整的表扫描，构建结果，然后将它们与第一个(外部)查询进行比较，这就是为什么它会变慢。试一试

SELECT orderID AS oid FROM tasks WHERE NOT EXISTS 
    (SELECT DISTINCT orderID AS oid2 FROM tasks WHERE
         engineer1 IS NOT NULL AND engineer2 IS NOT NULL and oid=oid2)

or something similiar ;-)

或类似的东西;-)

#3

I would try using joins instead

我将尝试使用join代替。

SELECT 
    t.orderID 
FROM 
    tasks  t
    LEFT JOIN tasks t1
        ON t.orderID =  t1.orderID
        AND t1.engineer1 IS NOT NULL 
        AND t1.engineer2 IS NOT NULL
WHERE
    t1.orderID IS NULL

also your original query would probably be easier to understand if it was specified as:

如果您的原始查询被指定为:

SELECT orderID FROM orders WHERE orderID NOT IN 
(SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL)

(assuming you have orders table with all the orders listed)

(假设你有列出所有订单的订单表)

which can be then rewritten using joins as:

然后可以使用连接重写为:

SELECT 
    o.orderID 
FROM 
    orders o
    LEFT JOIN tasks t
        ON o.orderID =  t.orderID
        AND t.engineer1 IS NOT NULL 
        AND t.engineer2 IS NOT NULL
WHERE
    t.orderID IS NULL

#4

I agree with TZQTZIO, I don't get your query.

我同意TZQTZIO的看法，我不明白你的问题。

If we assume the query did make sense then you might want to try using EXISTS as some suggest and avoid IN. IN is not always bad and there are likely cases which one could show it actually performs better than EXISTS.

如果我们假设查询是有意义的，那么您可能希望尝试使用exist作为建议，并避免使用。IN并不总是不好的，有一些例子可以证明它实际上比存在的更好。

The question title is not very helpful. I could set this query up in one Oracle database and make it run slow and make it run fast in another. There are many factors that determine how the database resolves the query, object statistics, SYS schema statistics, and parameters, as well as server performance. Sqlserver vs. Oracle isn't the problem here.

题目不是很有用。我可以在一个Oracle数据库中设置这个查询，让它运行得很慢，让它在另一个数据库中运行得很快。决定数据库如何解析查询、对象统计、SYS模式统计、参数以及服务器性能的因素有很多。Sqlserver vs. Oracle不是问题所在。

For those interested in query tuning and performance and want to learn more some of the google terms to search are "oak table oracle" and "oracle jonathan lewis".

对于那些对查询调优和性能感兴趣并希望了解更多谷歌术语的用户，可以使用“oak table oracle”和“oracle jonathan lewis”进行搜索。

#5

Some questions:

一些问题:

How many rows are there in tasks?
任务中有多少行?
What indexes are defined on it?
定义了什么索引?
Has the table been analyzed recently?
这张表最近分析过吗?

Another way to write the same query would be:

另一种编写相同查询的方法是:

select orderid from tasks
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL

However, I would rather expect the query to involve an "orders" table:

但是，我更希望查询包含一个“orders”表:

select orderid from ORDERS
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL

或

select orderid from ORDERS
where orderid not in
( select orderid from tasks
  where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
)

或

select orderid from ORDERS
where not exists
( select null from tasks
  where tasks.orderid = orders.orderid
  and   engineer1 IS NOT NULL OR engineer2 IS NOT NULL
)

#6

I think several people have pretty much the right SQL, but are missing a join between the inner and outer queries.
Try this:

我认为有些人几乎拥有正确的SQL，但是他们缺少内部查询和外部查询之间的连接。试试这个:

SELECT t1.orderID 
FROM   tasks t1
WHERE  NOT EXISTS
       (SELECT 1 
        FROM   tasks t2 
        WHERE  t2.orderID   = t1.orderID
        AND    t2.engineer1 IS NOT NULL 
        AND    t2.engineer2 IS NOT NULL)

#7

"Although I'm still curious why I have to do this, and if/when I would need to run it again?"

“虽然我还是很好奇为什么我要这么做，如果/什么时候我需要再运行一次?”

The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.

这些统计数据提供了Oracle基于成本的optimzer信息，用于确定不同执行计划的效率:例如，一个表中的行数、行的平均宽度、每列的最高和最低值、每列不同值的数量、索引的聚类因子等。

In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.

在一个小的数据库中，你可以设置一份工作，每天晚上收集统计数据，并让它独立。事实上，这是10g下的默认值。对于较大的实现，您通常需要权衡执行计划的稳定性与数据更改的方式，这是一个棘手的平衡。

Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.

Oracle还拥有一个称为“动态抽样”的特性，用于在执行时对表进行抽样，以确定相关统计信息。它更常用于数据仓库，在数据仓库中，采样的开销超过了长时间运行的查询的潜在性能提升。

#8

Isn't your query the same as

你的查询不是一样的吗

SELECT orderID FROM tasks
WHERE engineer1 IS NOT NULL OR engineer2 IS NOT NULL

吗?

#9

How about :

如何:

SELECT DISTINCT orderID FROM tasks t1 WHERE NOT EXISTS (SELECT * FROM tasks t2 WHERE t2.orderID=t1.orderID AND (engineer1 IS NOT NULL OR engineer2 IS NOT NULL));

I am not a guru of optimization, but maybe you also overlooked some indexes in your Oracle database.

我不是优化大师，但您可能也忽略了Oracle数据库中的一些索引。

#10

Another option is to use MINUS (EXCEPT on MSSQL)

另一个选项是使用负号(MSSQL除外)

SELECT orderID FROM tasks
MINUS
SELECT DISTINCT orderID FROM tasks WHERE engineer1 IS NOT NULL 
AND engineer2 IS NOT NULL

#11

If you decide to create an ORDERS table, I'd add an ALLOCATED flag to it, and create a bitmap index. This approach also forces you to modify the business logic to keep the flag updated, but the queries will be lightning fast. It depends on how critical are the queries for the application.

如果您决定创建一个ORDERS表，我将向它添加一个已分配的标志，并创建一个位图索引。这种方法还迫使您修改业务逻辑以保持标志更新，但查询速度很快。这取决于应用程序的查询有多重要。

Regarding the answers, the simpler the better in this case. Forget subqueries, joins, distinct and group bys, they are not needed at all!

对于答案，越简单越好。忘记子查询、连接、不同和组bys，它们根本就不需要!

#12

The Oracle optimizer does a good job of processing MINUS statements. If you re-write your query using MINUS, it is likely to run quite quickly:

Oracle优化器在处理负语句方面做得很好。如果您使用负号重新编写查询，它可能运行得非常快:

SELECT orderID FROM tasks
MINUS
SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL

#13

What proportion of the rows in the table meet the condition "engineer1 IS NOT NULL AND engineer2 IS NOT NULL"?

表中有多少行符合“engineer1不为NULL, engineer2不为NULL”的条件?

This tells you (roughly) whether it might be worth trying to use an index to retrieve the associated orderid's.

这(粗略地)告诉您是否值得尝试使用索引来检索关联的orderid。

Another way to write the query in Oracle that would handle unindexed cases very well would be:

在Oracle中编写查询的另一种方法是:

select distinct orderid
from
(
select orderid,
       max(case when engineer1 is null and engineer2 is null then 0 else 1)
          over (partition by orderid)
          as max_null_finder
from   tasks
)
where max_null_finder = 0

#14

New take.

新花。

Iff:

敌我识别:

The COUNT() function does not count NULL values
函数的作用是:不计算空值

and

和

You want the orderID of all tasks where none of the tasks have either engineer1 or engineer2 set to a value
您需要所有任务的orderID，其中没有一个任务将engineer1或engineer2设置为值

then this should do what you want:

那么这应该做你想做的:

SELECT orderID
FROM tasks
GROUP BY orderID
HAVING COUNT(engineer1) = 0 AND COUNT(engineer2) = 0

Please test it.

请测试它。

#15

I agree with ΤΖΩΤΖΙΟΥ and wearejimbo that your query should be...

我同意ΤΖΩΤΖΙΟΥ和wearejimbo查询应该……

SELECT DISTINCT orderID FROM Tasks 
WHERE Engineer1 IS NULL OR Engineer2 IS NULL;

I don't know about SQL Server, but this query won't be able to take advantage of any indexes because null rows aren't in indexes. The solution to this would be to re-write the query in a way that would allow a function based index to be created that only includes the null value rows. This could be done with NVL2, but would likely not be portable to SQL Server.

我不知道SQL Server，但是这个查询不能利用任何索引，因为空行不在索引中。解决这个问题的方法是重新编写查询，以允许创建仅包含空值行的基于函数的索引。这可以用NVL2实现，但可能不能移植到SQL Server。

I think the best answer is not one that meets your criteria and that is write a different statement for each platform that is best for that platform.

我认为最好的答案不是符合你的标准的那一个，那就是为每个平台写一个不同的声明对那个平台来说是最好的。

#16

-1

If you have no index over the Engineer1 and Engineer2 columns then you are always going to generate a Table Scan in SQL Server and the equivalent whatever that may be in Oracle.

如果在Engineer1和Engineer2列上没有索引，那么总是会在SQL Server中生成表扫描，并生成与Oracle中类似的表扫描。

If you just need the Orders that have unallocated tasks then the following should work just fine on both platforms, but you should also consider adding the indexes to the Tasks table to improve query perfomance.

如果您只需要有未分配任务的订单，那么接下来应该在两个平台上都可以正常工作，但是您也应该考虑将索引添加到任务表中，以提高查询性能。

SELECT DISTINCT orderID 
FROM tasks 
WHERE (engineer1 IS NULL OR engineer2 IS NULL)

#17

-1

Here is an alternate approach which I think gives what you want:

这里有另一种方法，我认为可以满足你的要求:

SELECT orderID
 FROM tasks
 GROUP BY orderID
 HAVING COUNT(engineer1) = 0 OR COUNT(engineer2) = 0

I'm not sure if you want "AND" or "OR" in the HAVING clause. It sounds like according to business logic these two fields should either both be populated or both be NULL; if this is guaranteed then you could reduce the condition to just checking engineer1.

我不确定你是否想要"或"或"或"在拥有条款中。根据业务逻辑，这两个字段要么被填充要么都为空;如果这是有保证的，那么您可以将条件简化为检查engineer1。

Your original query would, I think, give multiple rows per orderID, whereas mine will only give one. I am guessing this is OK since you are only fetching the orderID.

我认为，您最初的查询将为每个orderID提供多个行，而我的查询将只提供一个。我猜这没问题，因为您只获取orderID。

#18

-2

Sub-queries are "bad" with Oracle. It's generally better do use joins.

子查询在Oracle中是“坏”的。通常最好使用join。

Here's an article on how to rewrite your subqueries with join : http://www.dba-oracle.com/sql/t_rewrite_subqueries_performance.htm

这里有一篇关于如何用join重写子查询的文章:http://www.dba-oracle.com/sql/t_rewrite_subqueries_performance.htm

#1