Integration Services中已清理数据的临时存储

时间:2021-08-11 16:56:17

I have an Excel file that I need to process three times in integration services, once for projects, once for persons and once for time tracking data.

我有一个Excel文件,我需要在集成服务中处理三次,一次用于项目,一次用于人员,一次用于跟踪数据。

At each step I have the excel source and I do need to do some data clean up and type conversions (same in all three steps).

在每一步我都有excel源,我需要做一些数据清理和类型转换(在所有三个步骤中都相同)。

Is there an easy way of creating a step that does all this and that allows me to use the output as input to the other "real" steps?

是否有一种简单的方法来创建一个完成所有这一切的步骤,并允许我将输出用作其他“真实”步骤的输入?

I am starting to think about importing it into SQL server in a temp table, which is by all means ok, but it would be nice if I could skip that step.

我开始考虑将它导入临时表中的SQL服务器,这无疑是好的,但如果我可以跳过该步骤那将是很好的。

2 个解决方案

#1


This can actually be achieved using a single data flow.

实际上,这可以使用单个数据流来实现。

You can read the Excel data source once and then use Multicast Transformation to create copies of the data set in memory. You can then process each of your three data flow branches accordingly and can also make use of parallel processing!

您可以读取一次Excel数据源,然后使用多播转换在内存中创建数据集的副本。然后,您可以相应地处理三个数据流分支中的每一个,也可以使用并行处理!

See the following reference for details:

有关详细信息,请参阅以下参考

http://msdn.microsoft.com/en-us/library/ms137701(SQL.90).aspx

I hope what I have detailed is clear and understandable but please feel free to contact me directly if you require further guidance.

我希望我所详述的内容清晰易懂,但如果您需要进一步的指导,请随时与我联系。

Cheers, John

[Added in response to comments]

[在回复评论时添加]

With regard to your further question, you can specify the precedence/flow control of your package using more than one flow. So for example, you could use the multicast task to create three data flows however and then subsequently define your precedence flow control so that all transformation tasks in flow 1 must be completed before the transformations in flow two can begin.

关于您的进一步问题,您可以使用多个流指定包的优先级/流量控制。因此,例如,您可以使用多播任务创建三个数据流然后定义您的优先流控制,以便必须在流二中的转换开始之前完成流1中的所有转换任务。

#2


You could use three separate data flow tasks with a file operation task first. The File Operation would be to copy the original Excel file to a temporary area. Each of the three Data Flow tasks would start with the temp file and write to the temp file (I think they may need to write to a copy).

您可以首先使用三个单独的数据流任务和文件操作任务。文件操作是将原始Excel文件复制到临时区域。三个数据流任务中的每一个都将以临时文件开头并写入临时文件(我认为他们可能需要写入副本)。

An issue with this is that this makes the data flows operate sequentially. This might not be an issue for your Excel file, but would be an issue for processing larger numbers of rows. In such a case, it would be better to process the three "steps" in parallel, and join the results at the final stage.

这个问题是这使得数据流顺序运行。这可能不是您的Excel文件的问题,但是处理大量行的问题。在这种情况下,最好并行处理三个“步骤”,并在最后阶段加入结果。

#1


This can actually be achieved using a single data flow.

实际上,这可以使用单个数据流来实现。

You can read the Excel data source once and then use Multicast Transformation to create copies of the data set in memory. You can then process each of your three data flow branches accordingly and can also make use of parallel processing!

您可以读取一次Excel数据源,然后使用多播转换在内存中创建数据集的副本。然后,您可以相应地处理三个数据流分支中的每一个,也可以使用并行处理!

See the following reference for details:

有关详细信息,请参阅以下参考

http://msdn.microsoft.com/en-us/library/ms137701(SQL.90).aspx

I hope what I have detailed is clear and understandable but please feel free to contact me directly if you require further guidance.

我希望我所详述的内容清晰易懂,但如果您需要进一步的指导,请随时与我联系。

Cheers, John

[Added in response to comments]

[在回复评论时添加]

With regard to your further question, you can specify the precedence/flow control of your package using more than one flow. So for example, you could use the multicast task to create three data flows however and then subsequently define your precedence flow control so that all transformation tasks in flow 1 must be completed before the transformations in flow two can begin.

关于您的进一步问题,您可以使用多个流指定包的优先级/流量控制。因此,例如,您可以使用多播任务创建三个数据流然后定义您的优先流控制,以便必须在流二中的转换开始之前完成流1中的所有转换任务。

#2


You could use three separate data flow tasks with a file operation task first. The File Operation would be to copy the original Excel file to a temporary area. Each of the three Data Flow tasks would start with the temp file and write to the temp file (I think they may need to write to a copy).

您可以首先使用三个单独的数据流任务和文件操作任务。文件操作是将原始Excel文件复制到临时区域。三个数据流任务中的每一个都将以临时文件开头并写入临时文件(我认为他们可能需要写入副本)。

An issue with this is that this makes the data flows operate sequentially. This might not be an issue for your Excel file, but would be an issue for processing larger numbers of rows. In such a case, it would be better to process the three "steps" in parallel, and join the results at the final stage.

这个问题是这使得数据流顺序运行。这可能不是您的Excel文件的问题,但是处理大量行的问题。在这种情况下,最好并行处理三个“步骤”,并在最后阶段加入结果。