如何限制SSIS包中要处理的记录数?

时间:2021-05-01 10:28:31

I have a table with 7M records I want to trim down to 10k for dev. I tried a delete, but the whole world was nearly overpowered by the transaction log size, so I truncated the table.

我有一个包含7M记录的表,我想将其减少到10k for dev。我尝试了删除,但整个世界几乎被事务日志大小所压倒,所以我截断了表。

Now I wish to insert 10k records from the original table, into my dev table, but it has a identity column, and many, many other columns, so I'd thought I'd try SSIS (through the wizard), which handles the identity nicely, but gives me no place to edit a query. So I quickly made a view with a top clause, and changed the RowSet property of the source to the view. Now everything fails because nothing sees the view, although I copied and pasted the view name from my create view statement, which fails a second time because, lo, the view actually does exist.

现在我希望将原始表中的10k记录插入到我的开发表中,但它有一个标识列,还有很多很多其他列,所以我想我会尝试SSIS(通过向导)处理很好地认同,但没有地方可以编辑查询。所以我很快用top子句创建了一个视图,并将源的RowSet属性更改为视图。现在一切都失败了,因为没有人看到视图,虽然我从我的创建视图语句中复制并粘贴了视图名称,但是第二次失败,因为视图确实存在。

Does SSIS define which DB objects are used when a package is created, which would exclude the new view, and if so, how can I refresh that?

SSIS是否定义在创建包时使用哪些DB对象,这将排除新视图,如果是,我该如何刷新?

5 个解决方案

#1


1  

There's really no need to use SSIS to do this. You should be able to insert the records using SQL. First, you will need to set IDENTITY_INSERT to on. Then, you should be able to execute something like this:

实际上没有必要使用SSIS来做到这一点。您应该能够使用SQL插入记录。首先,您需要将IDENTITY_INSERT设置为on。然后,你应该能够执行这样的事情:

SET IDENTITY_INSERT db.schema.dev_table ON

SET IDENTITY_INSERT db.schema.dev_table ON

INSERT INTO dev_table SELECT TOP (10000) * FROM prod_table

INSERT INTO dev_table SELECT TOP(10000)* FROM prod_table

#2


1  

Ed is correct, SSIS is overkill for this task - especially as you are only inserting 10K records.

Ed是正确的,SSIS对于此任务来说太过分了 - 特别是当您只插入10K记录时。

Assuming the DEV table's schema is identical to the production, the script Ed displayed will work just fine.

假设DEV表的模式与生产相同,则显示的Ed脚本将正常工作。

If the schema is different, you can specify the columns specifically - including the identity column (remembering to set the identity insert OFF afterwards). For example:

如果架构不同,您可以专门指定列 - 包括标识列(记住以后将标识插入设置为OFF)。例如:

SET IDENTITY_INSERT dbo.dev_table ON
INSERT INTO dev_table (Id, Col1,Col2,Col3,Col4)
SELECT TOP 10000 Id, Col1, Col2, Col3, Col4 FROM prod_table
SET IDENTITY_INSERT dbo.dev_table OFF

#3


1  

You could also have used the row sampling control to extract a random number of records from the overall data rather than just getting the top 10000 rows. This would give a better sampling for use in development/testing since you would not be developing against only your 10000 oldest (if your distribution is like most tables I have seen) records, but instead a sampling from across your entire file.

您还可以使用行采样控件从总体数据中提取随机数量的记录,而不是仅获取前10000行。这将为开发/测试提供更好的采样,因为您不会仅针对10000个最旧的(如果您的分布与我见过的大多数表格)记录一起开发,而是从整个文件中进行采样。

#4


0  

Did you try closing and reopening the package? I wouldn't expect you to have to do this though. My first thought would be it is a security issue - that you haven't granted yourself select on it.

您是否尝试关闭并重新打开包裹?我不希望你必须这样做。我首先想到的是一个安全问题 - 你没有给自己选择它。

#5


0  

Are you using the fully qualified name to the view? Doe sit have a different owner than the default owner? OPen up the data source and do a preview of the data to make sure it's all hooked up.

您是否在视图中使用完全限定名称? Doe坐拥有与默认所有者不同的所有者?打开数据源并预览数据以确保它们全部连接起来。

#1


1  

There's really no need to use SSIS to do this. You should be able to insert the records using SQL. First, you will need to set IDENTITY_INSERT to on. Then, you should be able to execute something like this:

实际上没有必要使用SSIS来做到这一点。您应该能够使用SQL插入记录。首先,您需要将IDENTITY_INSERT设置为on。然后,你应该能够执行这样的事情:

SET IDENTITY_INSERT db.schema.dev_table ON

SET IDENTITY_INSERT db.schema.dev_table ON

INSERT INTO dev_table SELECT TOP (10000) * FROM prod_table

INSERT INTO dev_table SELECT TOP(10000)* FROM prod_table

#2


1  

Ed is correct, SSIS is overkill for this task - especially as you are only inserting 10K records.

Ed是正确的,SSIS对于此任务来说太过分了 - 特别是当您只插入10K记录时。

Assuming the DEV table's schema is identical to the production, the script Ed displayed will work just fine.

假设DEV表的模式与生产相同,则显示的Ed脚本将正常工作。

If the schema is different, you can specify the columns specifically - including the identity column (remembering to set the identity insert OFF afterwards). For example:

如果架构不同,您可以专门指定列 - 包括标识列(记住以后将标识插入设置为OFF)。例如:

SET IDENTITY_INSERT dbo.dev_table ON
INSERT INTO dev_table (Id, Col1,Col2,Col3,Col4)
SELECT TOP 10000 Id, Col1, Col2, Col3, Col4 FROM prod_table
SET IDENTITY_INSERT dbo.dev_table OFF

#3


1  

You could also have used the row sampling control to extract a random number of records from the overall data rather than just getting the top 10000 rows. This would give a better sampling for use in development/testing since you would not be developing against only your 10000 oldest (if your distribution is like most tables I have seen) records, but instead a sampling from across your entire file.

您还可以使用行采样控件从总体数据中提取随机数量的记录,而不是仅获取前10000行。这将为开发/测试提供更好的采样,因为您不会仅针对10000个最旧的(如果您的分布与我见过的大多数表格)记录一起开发,而是从整个文件中进行采样。

#4


0  

Did you try closing and reopening the package? I wouldn't expect you to have to do this though. My first thought would be it is a security issue - that you haven't granted yourself select on it.

您是否尝试关闭并重新打开包裹?我不希望你必须这样做。我首先想到的是一个安全问题 - 你没有给自己选择它。

#5


0  

Are you using the fully qualified name to the view? Doe sit have a different owner than the default owner? OPen up the data source and do a preview of the data to make sure it's all hooked up.

您是否在视图中使用完全限定名称? Doe坐拥有与默认所有者不同的所有者?打开数据源并预览数据以确保它们全部连接起来。