I have a database infrastructure where we are regularly (at least once a day) replicating the full content of tables from a source database to approximately 20 target databases. Due to the replication code in use (we have to use regular oracle queries, no control or direct access to source database) - this results in 20 full-table sorts of the source table.
我有一个数据库基础设施,我们定期(至少每天一次)将表的全部内容从源数据库复制到大约20个目标数据库。由于正在使用复制代码(我们必须使用常规的oracle查询,无法控制或直接访问源数据库) - 这会导致源表的20个全表排序。
Is there any way to optimize for this in the query? I'm looking for something that would basically tell oracle "I'm going to be repeatedly sorting this entire table"? MySQL had an option with myisamchk where you could tell it to sort a table and keep it in sorted order, but obviously that wouldn't apply here for multiple reasons.
有没有办法在查询中对此进行优化?我正在寻找一些基本上可以告诉oracle的东西“我将重复整理整个表格”? MySQL有myisamchk的选项,你可以告诉它对表进行排序并按排序顺序保存,但显然这里不适用于多种原因。
Currently, there are also some intermediate tables involved (sync from A to B, then from B to C.) We do have control over the intermediate tables, so if there are tuning options there, that would be useful as well.
目前,还有一些中间表涉及(从A到B同步,然后从B到C同步。)我们确实可以控制中间表,所以如果有调整选项,那么这也是有用的。
Generally, the queries are almost all of the very simplistic form:
通常,查询几乎都是非常简单的形式:
select a, b, c, d, e, ... z from tbl1 order by a, b, c, d, e, ... z;
I'm aware of streams, but as described above, the primary source tables are outside of our control, so we won't be able to use streams there. (Additionally, those source tables are rebuilt completely from a snapshot daily, so streams wouldn't really work anyway.)
我知道流,但如上所述,主要源表不在我们的控制范围内,因此我们将无法在那里使用流。 (此外,这些源表每天都会从快照中完全重建,因此无论如何流都不会真正起作用。)
5 个解决方案
#1
you could look into the multi-table INSERT feature. It should perform a single FULL SCAN and will insert into multiple tables. Consider (10gR2):
您可以查看多表INSERT功能。它应该执行单个FULL SCAN并将插入到多个表中。考虑(10gR2):
SQL> CREATE TABLE t1 (ID NUMBER);
Table created
SQL> CREATE TABLE t2 (ID NUMBER);
Table created
SQL> INSERT ALL
2 INTO t1 VALUES (d_id)
3 INTO t2 VALUES (d_id)
4 /* your select goes here */
5 SELECT ROWNUM d_id FROM dual d CONNECT BY LEVEL <= 5;
10 rows inserted
SQL> SELECT COUNT(*) FROM t1;
COUNT(*)
----------
5
SQL> SELECT COUNT(*) FROM t2;
COUNT(*)
----------
5
You will have to check if it works over database links.
您必须检查它是否适用于数据库链接。
#2
Some things that would help the sorting issue is to have indexes on the columns that you are sorting on (and also joining the tables on, if they're not there already). You could also create materialized views which are already sorted, and Oracle would keep the sorted results cached.
有助于排序问题的一些事情是在您要排序的列上有索引(如果它们已经存在,也会加入表)。您还可以创建已经排序的物化视图,Oracle将保持已排序的排序结果。
#3
You don't say exactly how the replication is done or the data volumes involved (or why you are sorting the data).
您没有确切地说明复制是如何完成的,或者涉及的数据量(或者您为什么要对数据进行排序)。
If the aim is to minimise the impact on the source database, your best bet may be to extract into an intermediate file and load the file into the destination databases. The sort could be done on the intermediate file (if plain text), or as part of either the export or import into the destination databases.
如果目标是最小化对源数据库的影响,最好的办法是提取到中间文件并将文件加载到目标数据库中。排序可以在中间文件(如果是纯文本)上完成,也可以作为导出或导入目标数据库的一部分完成。
In source database :
create table export_emp_info
organization external
( type oracle_datapump
default directory DATA_PUMP_DIR
location ('emp.dmp')
) as select emp_id, emp_name, dept_id from emp order by dept_id
/
Copy file then, import in dest database:
create table import_emp_info
(EMP_ID NUMBER(12),
EMP_NAME VARCHAR2(100),
DEPT_ID NUMBER)
organization external
( type oracle_datapump
default directory DATA_PUMP_DIR
location ('emp.dmp')
)
/
insert into emp_info select * from import_emp_info;
If you don't want or can't have the external table on the source db, you can use a straight expdp of the emp table (possibly using NETWORK_LINK if you have limited access to the source database directory structure) and QUERY to do the ordering.
如果您不希望或不能在源数据库上拥有外部表,则可以使用emp表的直接expdp(如果您对源数据库目录结构的访问权限有限,可能使用NETWORK_LINK)和QUERY来执行排序。
#4
You could load data from source table A to an intermediate table B and then do a partition exchange between B and destination table C. Exact replication, no sorting involved.
您可以将数据从源表A加载到中间表B,然后在B和目标表C之间进行分区交换。精确复制,不涉及排序。
#5
This I/U/D form of replication is what the MERGE command is there for. It's very doubtful that an expensive sort-merge would be required, and I'd expect to see hash joins instead. As long as the hash table can be stored in memory the hash join is barely more expensive than scanning the tables.
这种I / U / D形式的复制是MERGE命令的用途。非常值得怀疑的是,需要进行昂贵的排序合并,而且我希望看到散列连接。只要散列表可以存储在内存中,散列连接几乎比扫描表更昂贵。
A handy optimisation is to store a hash value based on the non-key attributes, so that you can join between source and target tables on the key column(s) and compare small hash values instead of the full set of columns - change detection made easy.
一个方便的优化是基于非键属性存储哈希值,以便您可以在键列上的源表和目标表之间进行连接,并比较小哈希值而不是完整的列集 - 进行更改检测简单。
#1
you could look into the multi-table INSERT feature. It should perform a single FULL SCAN and will insert into multiple tables. Consider (10gR2):
您可以查看多表INSERT功能。它应该执行单个FULL SCAN并将插入到多个表中。考虑(10gR2):
SQL> CREATE TABLE t1 (ID NUMBER);
Table created
SQL> CREATE TABLE t2 (ID NUMBER);
Table created
SQL> INSERT ALL
2 INTO t1 VALUES (d_id)
3 INTO t2 VALUES (d_id)
4 /* your select goes here */
5 SELECT ROWNUM d_id FROM dual d CONNECT BY LEVEL <= 5;
10 rows inserted
SQL> SELECT COUNT(*) FROM t1;
COUNT(*)
----------
5
SQL> SELECT COUNT(*) FROM t2;
COUNT(*)
----------
5
You will have to check if it works over database links.
您必须检查它是否适用于数据库链接。
#2
Some things that would help the sorting issue is to have indexes on the columns that you are sorting on (and also joining the tables on, if they're not there already). You could also create materialized views which are already sorted, and Oracle would keep the sorted results cached.
有助于排序问题的一些事情是在您要排序的列上有索引(如果它们已经存在,也会加入表)。您还可以创建已经排序的物化视图,Oracle将保持已排序的排序结果。
#3
You don't say exactly how the replication is done or the data volumes involved (or why you are sorting the data).
您没有确切地说明复制是如何完成的,或者涉及的数据量(或者您为什么要对数据进行排序)。
If the aim is to minimise the impact on the source database, your best bet may be to extract into an intermediate file and load the file into the destination databases. The sort could be done on the intermediate file (if plain text), or as part of either the export or import into the destination databases.
如果目标是最小化对源数据库的影响,最好的办法是提取到中间文件并将文件加载到目标数据库中。排序可以在中间文件(如果是纯文本)上完成,也可以作为导出或导入目标数据库的一部分完成。
In source database :
create table export_emp_info
organization external
( type oracle_datapump
default directory DATA_PUMP_DIR
location ('emp.dmp')
) as select emp_id, emp_name, dept_id from emp order by dept_id
/
Copy file then, import in dest database:
create table import_emp_info
(EMP_ID NUMBER(12),
EMP_NAME VARCHAR2(100),
DEPT_ID NUMBER)
organization external
( type oracle_datapump
default directory DATA_PUMP_DIR
location ('emp.dmp')
)
/
insert into emp_info select * from import_emp_info;
If you don't want or can't have the external table on the source db, you can use a straight expdp of the emp table (possibly using NETWORK_LINK if you have limited access to the source database directory structure) and QUERY to do the ordering.
如果您不希望或不能在源数据库上拥有外部表,则可以使用emp表的直接expdp(如果您对源数据库目录结构的访问权限有限,可能使用NETWORK_LINK)和QUERY来执行排序。
#4
You could load data from source table A to an intermediate table B and then do a partition exchange between B and destination table C. Exact replication, no sorting involved.
您可以将数据从源表A加载到中间表B,然后在B和目标表C之间进行分区交换。精确复制,不涉及排序。
#5
This I/U/D form of replication is what the MERGE command is there for. It's very doubtful that an expensive sort-merge would be required, and I'd expect to see hash joins instead. As long as the hash table can be stored in memory the hash join is barely more expensive than scanning the tables.
这种I / U / D形式的复制是MERGE命令的用途。非常值得怀疑的是,需要进行昂贵的排序合并,而且我希望看到散列连接。只要散列表可以存储在内存中,散列连接几乎比扫描表更昂贵。
A handy optimisation is to store a hash value based on the non-key attributes, so that you can join between source and target tables on the key column(s) and compare small hash values instead of the full set of columns - change detection made easy.
一个方便的优化是基于非键属性存储哈希值,以便您可以在键列上的源表和目标表之间进行连接,并比较小哈希值而不是完整的列集 - 进行更改检测简单。