Oracle 11g - 如何优化慢速并行插入选择?

时间:2021-02-02 13:50:20

we want to speed up the run of the parallel insert statement below. We are expecting to insert around 80M records and it is taking around 2 hours to finish.

我们想加快下面的并行插入语句的运行。我们期望插入大约80M的记录,大约需要2个小时才能完成。

INSERT /*+ PARALLEL(STAGING_EX,16) APPEND NOLOGGING */ INTO STAGING_EX (ID, TRAN_DT, 
RECON_DT_START, RECON_DT_END, RECON_CONFIG_ID, RECON_PM_ID) 
SELECT /*+PARALLEL(PM,16) */ SEQ_RESULT_ID.nextval, sysdate, sysdate, sysdate, 
'8a038312403e859201405245eed00c42', T1.ID FROM PM T1 WHERE STATUS = 1 and not 
exists(select 1 from RESULT where T1.ID = RECON_PM_ID and CREATE_DT >= sysdate - 60) and 
UPLOAD_DT >= sysdate - 1 and (FUND_SRC_TYPE = :1) 

We think that caching the results of the not exist column will speed up the inserts. How do we perform the caching? Any ideas how else to speed up the insert?

我们认为缓存not exists列的结果会加快插入速度。我们如何执行缓存?有什么想法加快插入速度?

Please see below for plan statistics from Enterprise Manager. Also we noticed that the statements are not being run in parallel. Is this normal?

请参阅以下有关企业管理器的计划统计信息。我们还注意到这些语句没有并行运行。这是正常的吗?

Oracle 11g  - 如何优化慢速并行插入选择?

Edit: btw, the sequence is already cached to 1M

编辑:顺便说一下,序列已经缓存到1M

3 个解决方案

#1


3  

Try using more bind variables, especially where nested loops might happen. I've noticed that you can use it in cases like

尝试使用更多绑定变量,尤其是在可能发生嵌套循环的情况下。我注意到你可以在像这样的情况下使用它

CREATE_DT >= :YOUR_DATE instead of CREATE_DT >= sysdate - 60 

I think this would explain why you have 180 million executions in the lowest part of your execution plan even though the whole other part of the update query is still at 8 million out of your 79 million.

我认为这可以解释为什么在执行计划的最低部分有1.8亿次执行,即使更新查询的其他部分仍然是7900万中的800万。

#2


5  

Improve statistics. The estimated number of rows is 1, but the actual number of rows is over 7 million and counting. This causes the execution plan to use a nested loop instead of a hash join. A nested loop works better for small amounts of data and a hash join works better for large amounts of data. Fixing that may be as easy as ensuring the relevant tables have accurate, current statistics. This can usually be done by gathering statistics with the default settings, for example: exec dbms_stats.gather_table_stats('SIRS_UATC1', 'TBL_RECON_PM');.

改善统计数据估计的行数是1,但实际行数超过700万并且计数。这会导致执行计划使用嵌套循环而不是散列连接。嵌套循环可以更好地处理少量数据,并且散列连接可以更好地处理大量数据。修复可能就像确保相关表格具有准确的当前统计数据一样简单。这通常可以通过使用默认设置收集统计信息来完成,例如:exec dbms_stats.gather_table_stats('SIRS_UATC1','TBL_RECON_PM');.

If that doesn't improve the cardinality estimate try using a dynamic sampling hint, such as /*+ dynamic_sampling(5) */. For such a long-running query it is worth spending a little extra time up-front sampling data if it leads to a better plan.

如果这不能改善基数估计,请尝试使用动态采样提示,例如/ * + dynamic_sampling(5)* /。对于这样一个长期运行的查询,如果它能带来更好的计划,那么前期采样数据需要额外花费一些时间。

Use statement-level parallelism instead of object-level parallelism. This is probably the most common mistake with parallel SQL. If you use object-level parallelism the hint must reference the alias of the object. Since 11gR2 there is no need to worry about specifying objects. This statement only needs a single hint: INSERT /*+ PARALLEL(16) APPEND */ .... Note that NOLOGGING is not a real hint.

使用语句级并行性而不是对象级并行性。这可能是并行SQL最常见的错误。如果使用对象级并行,则提示必须引用对象的别名。从11gR2开始,无需担心指定对象。此语句只需要一个提示:INSERT / * + PARALLEL(16)APPEND * / ....请注意,NOLOGGING不是真正的提示。

#3


0  

I can see 2 big problems:

我可以看到两大问题:

1 - hint parallel (in select) NO NOT work, beacuse it should be like this +PARALLEL(T1,16)

1 - 提示平行(在选择中)NO NOT工作,因为它应该像这样+ PARALLEL(T1,16)

2 - SELECT DO NOT optimal, it would be better if avoid expression NOT IN

2 - SELECT不是最优的,如果避免表达式NOT IN会更好

#1


3  

Try using more bind variables, especially where nested loops might happen. I've noticed that you can use it in cases like

尝试使用更多绑定变量,尤其是在可能发生嵌套循环的情况下。我注意到你可以在像这样的情况下使用它

CREATE_DT >= :YOUR_DATE instead of CREATE_DT >= sysdate - 60 

I think this would explain why you have 180 million executions in the lowest part of your execution plan even though the whole other part of the update query is still at 8 million out of your 79 million.

我认为这可以解释为什么在执行计划的最低部分有1.8亿次执行,即使更新查询的其他部分仍然是7900万中的800万。

#2


5  

Improve statistics. The estimated number of rows is 1, but the actual number of rows is over 7 million and counting. This causes the execution plan to use a nested loop instead of a hash join. A nested loop works better for small amounts of data and a hash join works better for large amounts of data. Fixing that may be as easy as ensuring the relevant tables have accurate, current statistics. This can usually be done by gathering statistics with the default settings, for example: exec dbms_stats.gather_table_stats('SIRS_UATC1', 'TBL_RECON_PM');.

改善统计数据估计的行数是1,但实际行数超过700万并且计数。这会导致执行计划使用嵌套循环而不是散列连接。嵌套循环可以更好地处理少量数据,并且散列连接可以更好地处理大量数据。修复可能就像确保相关表格具有准确的当前统计数据一样简单。这通常可以通过使用默认设置收集统计信息来完成,例如:exec dbms_stats.gather_table_stats('SIRS_UATC1','TBL_RECON_PM');.

If that doesn't improve the cardinality estimate try using a dynamic sampling hint, such as /*+ dynamic_sampling(5) */. For such a long-running query it is worth spending a little extra time up-front sampling data if it leads to a better plan.

如果这不能改善基数估计,请尝试使用动态采样提示,例如/ * + dynamic_sampling(5)* /。对于这样一个长期运行的查询,如果它能带来更好的计划,那么前期采样数据需要额外花费一些时间。

Use statement-level parallelism instead of object-level parallelism. This is probably the most common mistake with parallel SQL. If you use object-level parallelism the hint must reference the alias of the object. Since 11gR2 there is no need to worry about specifying objects. This statement only needs a single hint: INSERT /*+ PARALLEL(16) APPEND */ .... Note that NOLOGGING is not a real hint.

使用语句级并行性而不是对象级并行性。这可能是并行SQL最常见的错误。如果使用对象级并行,则提示必须引用对象的别名。从11gR2开始,无需担心指定对象。此语句只需要一个提示:INSERT / * + PARALLEL(16)APPEND * / ....请注意,NOLOGGING不是真正的提示。

#3


0  

I can see 2 big problems:

我可以看到两大问题:

1 - hint parallel (in select) NO NOT work, beacuse it should be like this +PARALLEL(T1,16)

1 - 提示平行(在选择中)NO NOT工作,因为它应该像这样+ PARALLEL(T1,16)

2 - SELECT DO NOT optimal, it would be better if avoid expression NOT IN

2 - SELECT不是最优的,如果避免表达式NOT IN会更好