需要基于SSIS中的其他列增加列值

时间:2022-01-21 08:53:26

I have a scenario like below.

我有如下场景。

Source Data like below (XML File)::

源数据如下(XML文件)::

ID        CatType           Name
11           A              Raj
22           A              Rai
33           B              Raki
44           B              Krish
55           A              Rem
66           B              Ram

I have to load above into below formate.

我必须把上面的装到下面。

ID       CatType        Name       LegacyID
1              A        Raj           11
2              A        Rai           22
1              B        Raki          33
2              B        Krish         44
3              A        Rem           55
3              B        Ram           66

ID and CatType are composite key in my destination table. I am getting CatType from source. While loading data, I have to increment ID by selecting Max(ID) where CatType= ?(based on CatType) in Destination table How can I load these records in SSIS, could anyone point me in the right direction?

ID和CatType是目标表中的复合键。我从源头上得到了香蒲。在加载数据时,我需要选择目标表中CatType= ?(基于CatType)的Max(ID)来增加ID,如何在SSIS中加载这些记录,谁能指出我的方向?

2 个解决方案

#1


3  

Here is a sample SSIS package created using SSIS 2008 R2 that demonstrates what you are trying to achieve. The sample package loads the incoming file data into a staging table. Then using the SQL Server Rank function in conjunction with Partition clause you can load the data as per your requirement. This sample assumes that your database version SQL Server 2005 or above. Since you didn't provide an exact XML file format, I have used CSV file format as the input files.

这里是一个SSIS包的示例,它是使用SSIS 2008 R2创建的,它演示了您想要实现的目标。示例包将传入的文件数据加载到staging表中。然后结合使用SQL Server Rank函数和Partition子句,您可以根据需要加载数据。本示例假设您的数据库版本SQL Server 2005或以上。由于您没有提供确切的XML文件格式,所以我使用了CSV文件格式作为输入文件。

  • Let's create two sample CSV files named Source_001.csv and Source_002.csv. Two files were created just to show the package logic works.
  • 让我们创建两个名为Source_001的CSV文件示例。csv和Source_002.csv。创建两个文件只是为了显示包逻辑工作。

需要基于SSIS中的其他列增加列值

需要基于SSIS中的其他列增加列值

  • In the SQL Server database, create the destination table named dbo.CategoryName. This is the final table where data will be loaded into. It has the composite key on the columns Id and CatType.
  • 在SQL Server数据库中,创建名为dbo.CategoryName的目标表。这是将数据加载到的最后一个表。它在列Id和CatType上有组合键。

需要基于SSIS中的其他列增加列值

  • In the SQL Server database, create the staging table named dbo.CategoryName_Staging. This is where the file data will be loaded into temporarily. This staging table structure mimics the incoming file structure.
  • 在SQL Server数据库中,创建名为dbo.CategoryName_Staging的staging表。这是文件数据临时加载的地方。这个staging表结构模仿传入的文件结构。

需要基于SSIS中的其他列增加列值

  • In the SQL Server database, create a stored procedure named dbo.PopulateDestination using the script provided in the section Stored Procedure Script provided in the bottom of this post. This stored procedure uses RANK function in combination with PARTITION clause to identify the correct Id that should be assigned to the CatType values.

    在SQL Server数据库中,创建一个名为dbo的存储过程。使用本文底部提供的存储过程脚本中提供的脚本填充目标。这个存储过程使用RANK函数结合PARTITION子句来识别应该分配给CatType值的正确Id。

  • Create a new SSIS package. Create an OLE DB Connection manager named SQLServer. This will point to your SQL Server database. Created a Flat File connection manager named Source.

    创建一个新的SSIS包。创建一个名为SQLServer的OLE DB连接管理器。这将指向您的SQL Server数据库。创建一个名为Source的平面文件连接管理器。

需要基于SSIS中的其他列增加列值

  • Configure the flat file connection manager as shown below. I had placed the source csv file in the path C:\temp\
  • 配置平面文件连接管理器,如下所示。我将源csv文件放在路径C:\temp\

需要基于SSIS中的其他列增加列值

  • On the Advanced tab of the Flat File Connection Manager Editor, rename the column information. LegacyId - integer, CatType - string (10), Name - string (30) the numbers denote the OutputColumnWidth.
  • 在平面文件连接管理器编辑器的高级选项卡上,重命名列信息。LegacyId - integer, CatType - string (10), Name - string(30)表示OutputColumnWidth。

需要基于SSIS中的其他列增加列值

  • On the SSIS package's control flow tab, place an Execute SQL task, followed by Data Flow Task and then followed by another Execute SQL task.
  • 在SSIS包的控制流选项卡上,放置一个执行SQL任务,然后是数据流任务,然后是另一个执行SQL任务。

需要基于SSIS中的其他列增加列值

  • Double-click on the first Execute SQL task and configure it to truncate the staging table.
  • 双击第一个执行SQL任务并将其配置为截断staging表。

需要基于SSIS中的其他列增加列值

  • Double-click on the Data Flow Task to switch to the data flow tab. Inside the data flow tab, place a Flat File Source manager to read the CSV file and place an OLE DB Destination to write the data into the staging table.
  • 双击数据流任务,切换到Data Flow选项卡。在data flow选项卡中,放置一个平面文件源管理器来读取CSV文件,并将一个OLE DB目的地写入到staging表中。

需要基于SSIS中的其他列增加列值

  • Configure the flat file source as shown below to read the flat file source using the flat file connection manager.
  • 配置平面文件源,如下所示,以使用平面文件连接管理器读取平面文件源。

需要基于SSIS中的其他列增加列值

需要基于SSIS中的其他列增加列值

  • Configure the OLE DB destinaton as shown below to accept the incoming data and write into the destination file.
  • 配置OLE DB destinaton(如下所示)以接受传入数据并写入目标文件。

需要基于SSIS中的其他列增加列值

需要基于SSIS中的其他列增加列值

  • Go back to Control flow tab, configure the last Execute SQL task to invoke the newly created stored procedure. The package development is now complete.
  • 回到Control flow选项卡,配置最后一个执行SQL任务以调用新创建的存储过程。包开发现在已经完成。

需要基于SSIS中的其他列增加列值

  • Execute the package. Remember, the package is configured to read only the first source file Source_001.csv. After the package execution, query the destination table CategoryName to find the following data.
  • 执行方案。记住,这个包被配置为只读取第一个源文件Source_001.csv。在执行包之后,查询目标表CategoryName以找到以下数据。

需要基于SSIS中的其他列增加列值

  • Now, stop the SSIS package execution, double-click on the Flat File Connection Manager named Source. Change the file name path to Source_002.csv in order to read the second file.
  • 现在,停止SSIS包的执行,双击名为Source的平面文件连接管理器。将文件名路径更改为Source_002。读取第二个文件的csv文件。

需要基于SSIS中的其他列增加列值

  • Execute the package again. The package is now configured to read the second source file Source_002.csv. This execution will append rows to the already populated desitnation table. After the package execution, query the destination table CategoryName to find the following addition data and note that the Id columns is correctly populater.
  • 再次执行方案。这个包现在被配置为读取第二个源文件Source_002.csv。此执行将向已填充的desitnation表追加行。在包执行之后,查询目标表CategoryName以查找以下添加数据,并注意Id列是正确填充的。

Hope that helps.

希望有帮助。

需要基于SSIS中的其他列增加列值

Stored Procedure Script:

存储过程脚本:

CREATE PROCEDURE [dbo].[PopulateDestination]
AS
BEGIN
    SET NOCOUNT ON;

    INSERT INTO dbo.CategoryName (Id, CatType, Name, LegacyId)
    SELECT      MAXID.Id + RANK() OVER(PARTITION BY CatType ORDER BY LegacyId) Id
            ,   CS.CatType
            ,   CS.Name
            ,   CS.LegacyId
    FROM        dbo.CategoryName_Staging    CS
    CROSS APPLY (
                    SELECT  COALESCE(MAX(Id), 0) Id
                    FROM    dbo.CategoryName C
                    WHERE   C.CatType = CS.CatType
                ) MAXID
END

#2


1  

so what you really need is to create a new ID column based on cat type and change the column name from ID to legacy ID?

因此,您真正需要的是基于cat类型创建一个新的ID列,并将列名称从ID更改为遗留ID?

If yes, you can do both things using a derived column

如果是,您可以使用派生列做这两件事

#1


3  

Here is a sample SSIS package created using SSIS 2008 R2 that demonstrates what you are trying to achieve. The sample package loads the incoming file data into a staging table. Then using the SQL Server Rank function in conjunction with Partition clause you can load the data as per your requirement. This sample assumes that your database version SQL Server 2005 or above. Since you didn't provide an exact XML file format, I have used CSV file format as the input files.

这里是一个SSIS包的示例,它是使用SSIS 2008 R2创建的,它演示了您想要实现的目标。示例包将传入的文件数据加载到staging表中。然后结合使用SQL Server Rank函数和Partition子句,您可以根据需要加载数据。本示例假设您的数据库版本SQL Server 2005或以上。由于您没有提供确切的XML文件格式,所以我使用了CSV文件格式作为输入文件。

  • Let's create two sample CSV files named Source_001.csv and Source_002.csv. Two files were created just to show the package logic works.
  • 让我们创建两个名为Source_001的CSV文件示例。csv和Source_002.csv。创建两个文件只是为了显示包逻辑工作。

需要基于SSIS中的其他列增加列值

需要基于SSIS中的其他列增加列值

  • In the SQL Server database, create the destination table named dbo.CategoryName. This is the final table where data will be loaded into. It has the composite key on the columns Id and CatType.
  • 在SQL Server数据库中,创建名为dbo.CategoryName的目标表。这是将数据加载到的最后一个表。它在列Id和CatType上有组合键。

需要基于SSIS中的其他列增加列值

  • In the SQL Server database, create the staging table named dbo.CategoryName_Staging. This is where the file data will be loaded into temporarily. This staging table structure mimics the incoming file structure.
  • 在SQL Server数据库中,创建名为dbo.CategoryName_Staging的staging表。这是文件数据临时加载的地方。这个staging表结构模仿传入的文件结构。

需要基于SSIS中的其他列增加列值

  • In the SQL Server database, create a stored procedure named dbo.PopulateDestination using the script provided in the section Stored Procedure Script provided in the bottom of this post. This stored procedure uses RANK function in combination with PARTITION clause to identify the correct Id that should be assigned to the CatType values.

    在SQL Server数据库中,创建一个名为dbo的存储过程。使用本文底部提供的存储过程脚本中提供的脚本填充目标。这个存储过程使用RANK函数结合PARTITION子句来识别应该分配给CatType值的正确Id。

  • Create a new SSIS package. Create an OLE DB Connection manager named SQLServer. This will point to your SQL Server database. Created a Flat File connection manager named Source.

    创建一个新的SSIS包。创建一个名为SQLServer的OLE DB连接管理器。这将指向您的SQL Server数据库。创建一个名为Source的平面文件连接管理器。

需要基于SSIS中的其他列增加列值

  • Configure the flat file connection manager as shown below. I had placed the source csv file in the path C:\temp\
  • 配置平面文件连接管理器,如下所示。我将源csv文件放在路径C:\temp\

需要基于SSIS中的其他列增加列值

  • On the Advanced tab of the Flat File Connection Manager Editor, rename the column information. LegacyId - integer, CatType - string (10), Name - string (30) the numbers denote the OutputColumnWidth.
  • 在平面文件连接管理器编辑器的高级选项卡上,重命名列信息。LegacyId - integer, CatType - string (10), Name - string(30)表示OutputColumnWidth。

需要基于SSIS中的其他列增加列值

  • On the SSIS package's control flow tab, place an Execute SQL task, followed by Data Flow Task and then followed by another Execute SQL task.
  • 在SSIS包的控制流选项卡上,放置一个执行SQL任务,然后是数据流任务,然后是另一个执行SQL任务。

需要基于SSIS中的其他列增加列值

  • Double-click on the first Execute SQL task and configure it to truncate the staging table.
  • 双击第一个执行SQL任务并将其配置为截断staging表。

需要基于SSIS中的其他列增加列值

  • Double-click on the Data Flow Task to switch to the data flow tab. Inside the data flow tab, place a Flat File Source manager to read the CSV file and place an OLE DB Destination to write the data into the staging table.
  • 双击数据流任务,切换到Data Flow选项卡。在data flow选项卡中,放置一个平面文件源管理器来读取CSV文件,并将一个OLE DB目的地写入到staging表中。

需要基于SSIS中的其他列增加列值

  • Configure the flat file source as shown below to read the flat file source using the flat file connection manager.
  • 配置平面文件源,如下所示,以使用平面文件连接管理器读取平面文件源。

需要基于SSIS中的其他列增加列值

需要基于SSIS中的其他列增加列值

  • Configure the OLE DB destinaton as shown below to accept the incoming data and write into the destination file.
  • 配置OLE DB destinaton(如下所示)以接受传入数据并写入目标文件。

需要基于SSIS中的其他列增加列值

需要基于SSIS中的其他列增加列值

  • Go back to Control flow tab, configure the last Execute SQL task to invoke the newly created stored procedure. The package development is now complete.
  • 回到Control flow选项卡,配置最后一个执行SQL任务以调用新创建的存储过程。包开发现在已经完成。

需要基于SSIS中的其他列增加列值

  • Execute the package. Remember, the package is configured to read only the first source file Source_001.csv. After the package execution, query the destination table CategoryName to find the following data.
  • 执行方案。记住,这个包被配置为只读取第一个源文件Source_001.csv。在执行包之后,查询目标表CategoryName以找到以下数据。

需要基于SSIS中的其他列增加列值

  • Now, stop the SSIS package execution, double-click on the Flat File Connection Manager named Source. Change the file name path to Source_002.csv in order to read the second file.
  • 现在,停止SSIS包的执行,双击名为Source的平面文件连接管理器。将文件名路径更改为Source_002。读取第二个文件的csv文件。

需要基于SSIS中的其他列增加列值

  • Execute the package again. The package is now configured to read the second source file Source_002.csv. This execution will append rows to the already populated desitnation table. After the package execution, query the destination table CategoryName to find the following addition data and note that the Id columns is correctly populater.
  • 再次执行方案。这个包现在被配置为读取第二个源文件Source_002.csv。此执行将向已填充的desitnation表追加行。在包执行之后,查询目标表CategoryName以查找以下添加数据,并注意Id列是正确填充的。

Hope that helps.

希望有帮助。

需要基于SSIS中的其他列增加列值

Stored Procedure Script:

存储过程脚本:

CREATE PROCEDURE [dbo].[PopulateDestination]
AS
BEGIN
    SET NOCOUNT ON;

    INSERT INTO dbo.CategoryName (Id, CatType, Name, LegacyId)
    SELECT      MAXID.Id + RANK() OVER(PARTITION BY CatType ORDER BY LegacyId) Id
            ,   CS.CatType
            ,   CS.Name
            ,   CS.LegacyId
    FROM        dbo.CategoryName_Staging    CS
    CROSS APPLY (
                    SELECT  COALESCE(MAX(Id), 0) Id
                    FROM    dbo.CategoryName C
                    WHERE   C.CatType = CS.CatType
                ) MAXID
END

#2


1  

so what you really need is to create a new ID column based on cat type and change the column name from ID to legacy ID?

因此,您真正需要的是基于cat类型创建一个新的ID列,并将列名称从ID更改为遗留ID?

If yes, you can do both things using a derived column

如果是,您可以使用派生列做这两件事