I have two database sachems containing millions of records (60 - 100 million records) (lets assume student records)
First schema is staging schema, second is the target prod schema,
我有两个数据库sachems包含数百万条记录(6亿到1亿条记录)(让我们假设学生记录)第一个模式是暂存模式,第二个是目标prod模式,
I would like to check if the same user in staging schema already exist in prod schema before i copy it over (if it exist then apply some merge logic)
我想在复制它之前检查prod模式中是否存在临时模式中的相同用户(如果存在则应用某些合并逻辑)
I have some PL/Sql code that runs sequentially and matches records, but the process is extremely slow, even when indexing and performance tuning.
我有一些PL / Sql代码按顺序运行并匹配记录,但即使在索引和性能调整时,该过程也非常慢。
Any matchers, or multithreading of pl/sql function that can be used? Is there any better alternative in oracle that I might be missing?
是否可以使用任何匹配器或pl / sql函数的多线程?我可能会错过任何更好的oracle备选方案吗?
One possible solution is to copy some of the data (data participating in the duplication process) from prod schema and perform the comparison in the staging schema but the copy data overhead might be the same as comparing.
一种可能的解决方案是从prod模式复制一些数据(参与复制过程的数据)并在分段模式中执行比较,但复制数据开销可能与比较相同。
sample record:
样本记录:
Student_first_name,Student_Last_name,SSN
foo, ,bar ,123456
1 个解决方案
#1
0
First - copying the data between schemas would not benefit your performance, Oracle does not perform faster on intra-schema queries.
首先 - 在模式之间复制数据不会有利于您的性能,Oracle在模式内查询上的执行速度更快。
Second - Using a single SQL to identify your duplicate records (or missing records, whichever is the smaller part of the table, and then performing your pl/sql code on these rows alone may greatly help (by storing them in a cursor or flaging them using a dedicated column), especially if the amount of data added each day is negligible in comparison to the full prod table.
第二 - 使用单个SQL来识别您的重复记录(或丢失记录,无论哪个是表中较小的部分,然后单独在这些行上执行pl / sql代码)可能会有很大帮助(通过将它们存储在游标中或展开它们)使用专用列),特别是如果与完整产品表相比,每天添加的数据量可以忽略不计。
#1
0
First - copying the data between schemas would not benefit your performance, Oracle does not perform faster on intra-schema queries.
首先 - 在模式之间复制数据不会有利于您的性能,Oracle在模式内查询上的执行速度更快。
Second - Using a single SQL to identify your duplicate records (or missing records, whichever is the smaller part of the table, and then performing your pl/sql code on these rows alone may greatly help (by storing them in a cursor or flaging them using a dedicated column), especially if the amount of data added each day is negligible in comparison to the full prod table.
第二 - 使用单个SQL来识别您的重复记录(或丢失记录,无论哪个是表中较小的部分,然后单独在这些行上执行pl / sql代码)可能会有很大帮助(通过将它们存储在游标中或展开它们)使用专用列),特别是如果与完整产品表相比,每天添加的数据量可以忽略不计。