When data is copied over from source to destination in a SSIS package, source being a sql query with 'group by' keywords used and destination being a table, is it necessary that the data at a row position has to match the data at the same row position at the destination table??
当数据在SSIS包中从源复制到目标时,源是使用'group by'关键字并且destination是表的sql查询,行位置的数据是否必须与相同的数据匹配目的地表的行位置??
sagar
2 个解决方案
#1
You can use a clustered index to force things to be stored in an ordered way, but as Peter notes this has a performance penalty for incremental updates.
您可以使用聚簇索引强制以有序的方式存储事物,但正如Peter所说,这对增量更新有性能损失。
Are you concerened about getting things out in order? That's an ORDER BY on your queries or perhaps you should create a standardised view that shows things in the order you want.
你是否因为整理好事而受到谴责?这是您查询的ORDER BY,或者您应该创建一个标准化视图,按您想要的顺序显示事物。
#2
Its a performance question, really. Tables have no logical ordering. Or course the data does have a physical order on disk, and I/O has a significant effect on performance, so the best approach will depend on a) how the table is being populated (complete refresh vs. incremental update) and b) how the table is used downstream.
这是一个性能问题,真的。表没有逻辑顺序。或者,数据确实在磁盘上有物理顺序,I / O对性能有显着影响,因此最佳方法取决于a)如何填充表(完全刷新与增量更新)和b)如何该表用于下游。
You could create a clustered index on the target table with the same columns as you have in the GROUP BY clause. This will physically order the data on disk by the keys of the clustered index.
您可以使用与GROUP BY子句中相同的列在目标表上创建聚簇索引。这将通过聚簇索引的键对磁盘上的数据进行物理排序。
If the target table is completely repopulated each time the package is run (drop-recreate or truncate), this may be a good design, since the incoming data will probably be in the right order.
如果每次运行包(drop-recreate或truncate)时目标表都完全重新填充,这可能是一个很好的设计,因为传入的数据可能是正确的顺序。
If the target table is incrementally updated each time the package is run, this may be a bad design, since the database will have to interleave the incoming data with existing data on each insert, which can be quite expensive.
如果每次运行包时目标表都会逐步更新,这可能是一个糟糕的设计,因为数据库必须将输入数据与每个插入的现有数据交错,这可能非常昂贵。
#1
You can use a clustered index to force things to be stored in an ordered way, but as Peter notes this has a performance penalty for incremental updates.
您可以使用聚簇索引强制以有序的方式存储事物,但正如Peter所说,这对增量更新有性能损失。
Are you concerened about getting things out in order? That's an ORDER BY on your queries or perhaps you should create a standardised view that shows things in the order you want.
你是否因为整理好事而受到谴责?这是您查询的ORDER BY,或者您应该创建一个标准化视图,按您想要的顺序显示事物。
#2
Its a performance question, really. Tables have no logical ordering. Or course the data does have a physical order on disk, and I/O has a significant effect on performance, so the best approach will depend on a) how the table is being populated (complete refresh vs. incremental update) and b) how the table is used downstream.
这是一个性能问题,真的。表没有逻辑顺序。或者,数据确实在磁盘上有物理顺序,I / O对性能有显着影响,因此最佳方法取决于a)如何填充表(完全刷新与增量更新)和b)如何该表用于下游。
You could create a clustered index on the target table with the same columns as you have in the GROUP BY clause. This will physically order the data on disk by the keys of the clustered index.
您可以使用与GROUP BY子句中相同的列在目标表上创建聚簇索引。这将通过聚簇索引的键对磁盘上的数据进行物理排序。
If the target table is completely repopulated each time the package is run (drop-recreate or truncate), this may be a good design, since the incoming data will probably be in the right order.
如果每次运行包(drop-recreate或truncate)时目标表都完全重新填充,这可能是一个很好的设计,因为传入的数据可能是正确的顺序。
If the target table is incrementally updated each time the package is run, this may be a bad design, since the database will have to interleave the incoming data with existing data on each insert, which can be quite expensive.
如果每次运行包时目标表都会逐步更新,这可能是一个糟糕的设计,因为数据库必须将输入数据与每个插入的现有数据交错,这可能非常昂贵。