[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

时间:2024-06-11 00:07:44

介绍

在本文中我们将利用SSIS参数,变量 以及 Foreach Loop Container 从多个源动态导入数据.

开始前我们先下载一些数据.WeatherData_Dec08_Apr09.zip  解压以后我们看到有三个目录数据

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 1

每个月年目录包含一个子目录 TH.每个TH 目录包含文件 sensor1-all.csv.  这些文件是 December 2008 至 April 2009 温度和湿度数据. February 2009 的文件包括December 2008 至February 2009 的数据, April 2009 的文件包括December 2008 至 April 2009  每个文件开始如下

Date,Time,Min T,Max T,Average T,Min H,Max H,Average H,Comfort zone,Min DP,Max DP,Average DP,Min HI,Max HI,Average HI,Low Batt
2008-12-25,19:00,8.5,10.9,9.71,32,36,33,2,-6.0,-5.0,-5.71,--,--,--
2008-12-25,20:00,6.3,8.5,7.21,36,40,38,2,-6.0,-5.0,-5.95,--,--,--
2008-12-25,21:00,5.3,6.7,6.37,39,43,40,0,-6.0,-6.0,-6.00,--,--,--

Listing 1

因为三个文件有冗余记录,所以我们要用增量导入策略. 可以查看以前的文章:

[译]Stairway to Integration Services  Level 3 - 增量导入数据

[译]Stairway to Integration Services Level 4 - 增量更新数据

[译]Stairway to Integration Services  Level 5 - 增量删除数据

因为我们的数据是向前滚动 “forward-only”的, 没有更新或者删除. 所以我们只要更新新行就可以. 我们会采用这篇文章的方式导入数据. [译]Stairway to Integration Services  Level 3 - 增量导入数据

打开 LoadWeatherData.dtsx SSIS 包 :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 4

双击“DFT Stage Temperature and Humidity” 进去:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 5

把 FFSrc Temperature and Humidity” 与“OLEDest Stage Temperature” 之间的data flow path 删掉 :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 6

再拖个  Lookup Transformation :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 7

把 “FFSrc Temperature and Humidity”连到 Lookup transformation :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 8

打开 Lookup Transformation Editor.  “Specify how to handle rows with no matching entries” 下拉框选择 “Redirect rows to no match output”

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 9

点击 Connection 页.选择WeatherData数据库的连接. 点 “Use results of a SQL query” 输入以下查询语句:

Select [Date], [Time]
From dbo.StageTemperature

Listing  3

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 10

点击 Columns 页,映射如下 :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 11

注: 如果我们选择 table or view, 那整个表都会载入到内存.. 如果我们用 OLE DB Connection Manager 连接类型并使用 Full Cache 模式, 缓存会在 Data Flow Task 的 PreExecute 事件开始.

从Lookup transformation 拖一个output 到 “OLEDest Stage Temperature”

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 12

选择 “Lookup No Match Output” 并确定:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 13

先把StageTemperature 清一下

Use WeatherData
go
Truncate Table StageTemperature

Listing 2

执行一下包,你的数据流看着应该同下面一样:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 14

再次运行, 现在应该和下图一样了 :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 15

一开始因为没有记录所以把106条记录都导进去了,但是第二次运行以后进行匹配. 因为没新数据.所以导入数量为0

循环  Loop It!

接着我们要改动一下 “FFCM Temperature and Humidity” 连接管理器

首先把包参数里面的 SourceFilePath 删除:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 16

接着,打开项目参数Project.params :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 17

添加如下变量:

  • Name: SourceFolder
  • Data type: String
  • Value: < WeatherData_Dec08_Apr09.zip 的解压目录>
  • Sensitive: False
  • Required: False
  • Description: The folder containing the weather data files.

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 18

在  LoadWeatherData.dtsx  里拖一个 Foreach Loop Container:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 19

在吧 “DFT Stage Temperature and Humidity” 拖进去,然后把Foreach Loop Container  改名为 “FOREACH Temperature File”

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 21

双击“FOREACH Temperature File” 进行配置

Collection 里面Enumerator属性选择“Foreach File Enumerator” 
Expressions  选择 Directory 并且值设置为 项目变量 SourceFolder

把 Files 值由“*.*”改为“*.csv” 来设置美剧只返回 CSV (comma-separated values) 文件. 
然后选择 “Fully qualified” 选项,这样可以接收完整的CSV文件路径 . 
勾选“Traverse subfolders” 这样可以递归查询目录:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 27

这里的 Variable Mappings 是干嘛的?  由于Foreach File Enumerator 会递归循环我们指定的目录 ,一次返回一个CSV文件的路径,然后执行一次循环. 我们需要一个变量存储CSV文件的路径,并且逐步加载 .点击 “<New Variable…>” :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 28

变量名   SourceFileName.

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 29

索引值默认为0

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 30

最后一部是改变“FFCM Temperature and Humidity” 平面文件连接管理器的 ConnectionString 属性  选择新的变量 @[User::SourceFileName]

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 34

在“DFT Stage Temperature and Humidity” 上加个断点,方便我们测试 :

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 35

选择 “Break when the container receives the OnPostExecute event” 选项:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 36

选择“Break when the container receives the OnPostExecute event” 选项允许数据流任务执行一次迭代,一次迭代执行完成以后会中断 .

让我们把每个步骤分解一下:

  1. “FOREACH Temperature File” Foreach loop 进入$Project::SourceFolder  项目参数指定的目录 .
  2. 每个CSV文件被辨识,一个文件一次. 读入到 User::SourceFileName 变量.
  3. User::SourceFileName 操控 “FFCM Temperature and Humidity” 连接管理器的 ConnectionString 属性
  4. “DFT Stage Temperature and Humidity”数据流任务一次执行个 CSV 文件.
  5. T“FFSrc Temperature and Humidity” 数据源适配器 连接 “FFCM Temperature and Humidity”连接管理器指定的CSV文件 .
  6. 数据被“DFT Stage Temperature and Humidity” 数据流任务传输 .

让我们实际做一遍:

按F5

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 37

因为是在 PostExecute 事件中断,所以 “DFT Stage Temperature and Humidity” 数据流任务已经执行.  执行结果如下:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 38

因为有106条数据之前已经存在表中,所以这次只导入了 1,877 条.

查看下变量窗口 (Debug -> Windows –> Locals )

查看User::SourceFilePath 变量,注意 sensro1-all.csv 在 Apr09目录里面:

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 40

继续运行,因为Dec08 的文件已经到如果,所以这次没有记录被导入到表中 .

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 41

再次看 Locals窗口 User::SourceFilePath 变量已经变了

[译]Stairway to Integration Services Level 16 – Flexible Source Locations (多文件导入)

Figure 42

直到第三次中断以后,整个包执行完成.

原文地址 : http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/106587/

项目文件 : My_First_SSIS_Project_After_Step_16.7z