在脚本任务中向Dictionary添加重复键时出错

时间:2021-12-06 16:38:49

The script has been running well for ages, but has suddenly started falling over stating

该脚本已经运行了很长时间,但突然开始摔倒了

Error: 0x0 at (SCR) GetLineageIDs, ProcessDataFlowTask error:: An item with the same key has already been added.
   at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at ST_b90e02c5aa5e4a7a992a1a75c6255cfa.ScriptMain.ProcessDataFlowTask(TaskHost currentDataFlowTask)

The objective is to get lineageIDs mapped to the actual column name in my SSIS package (2012,so no 2016 lineage functionality).

目标是将lineageID映射到我的SSIS包中的实际列名称(2012年,因此没有2016年的沿袭功能)。

I get that I am trying to add a key that has already been added to my dictionary in the following script, I'm not sure how and why it has suddenly started to error, full script below. I think I need some sort of if block in my ProcessDataFlowTask method, any pointers would be gratefully received and an explanation as to why the duplicate key error is suddenly appearing?

我知道我正在尝试添加一个已经添加到我的字典中的密钥,在下面的脚本中,我不确定它是如何以及为什么它突然开始出错,下面是完整的脚本。我想在我的ProcessDataFlowTask方法中需要某种if块,任何指针都会被感激地接收并解释为什么重复键错误会突然出现?

namespace ST_b90e02c5aa5e4a7a992a1a75c6255cfa
{
    /// <summary>
    /// ScriptMain is the entry point class of the script.  Do not change the name, attributes,
    /// or parent of this class.
    /// </summary>
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {


        Dictionary<int, string> lineageId = null;

        public void Main()
        {

            try
            {
                // Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
                // Why the executables?  Well, SSIS won't let us store a reference to the Package itself...
                Dts.Variables["User::execsObj"].Value = ((Package)Dts.Variables["User::execsObj"].Parent).Executables;
                Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
                lineageId = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;

                Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;

                ReadExecutables(execs);

                Dts.TaskResult = (int)ScriptResults.Success;

            }
            catch (Exception ex)
            {
                //An error occurred.  
                Dts.Events.FireError(0, "SSIS variable read error:", ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }  

        }

        private void ReadExecutables(Executables executables)
        {

            try
            {

                foreach (Executable pkgExecutable in executables)
                {
                    if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
                    {
                        TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
                        if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
                        {
                            ProcessDataFlowTask(pkgExecTaskHost);
                        }
                    }
                    else if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
                    {
                        // Recurse into FELCs
                        ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
                    }
                }

            }
            catch (Exception ex)
            {
                //An error occurred.  
                Dts.Events.FireError(0, "ReadExecutables error:", ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }  

        }

        private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
        {

            try
            {

                MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;
                foreach (IDTSComponentMetaData100 currentComponent in currentDataFlow.ComponentMetaDataCollection)
                {
                    // Get the inputs in the component.
                    foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
                        foreach (IDTSInputColumn100 currentInputColumn in currentInput.InputColumnCollection)
                           lineageId.Add(currentInputColumn.ID, currentInputColumn.Name);


                    // Get the outputs in the component.
                    foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
                        foreach (IDTSOutputColumn100 currentoutputColumn in currentOutput.OutputColumnCollection)
                            lineageId.Add(currentoutputColumn.ID, currentoutputColumn.Name);

                }

            }
            catch (Exception ex)
            {
                //An error occurred.  
                Dts.Events.FireError(0, "ProcessDataFlowTask error:", ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }  


        }
        #region ScriptResults declaration
        /// <summary>
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// </summary>
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

    }


}

1 个解决方案

#1


2  

The 'duplicate key' error occurs if you have a duplicate column name in any of the data flows in the package. I had this problem before and I solved it by adding the following 2 steps.

Step 1
Make the following changes to the ProcessDataFlowTask method at both instances of lineageId.Add(...)

如果包中的任何数据流中有重复的列名,则会出现“重复键”错误。之前我遇到过这个问题,我通过添加以下两个步骤解决了这个问题。步骤1在lineageId.Add(...)的两个实例上对ProcessDataFlowTask方法进行以下更改

Remember to use currentOutputColumn.Name instead of currentInputColumn.Name at the second instance

请记住在第二个实例中使用currentOutputColumn.Name而不是currentInputColumn.Name

Change from

改变

lineageId.Add(currentInputColumn.ID, currentInputColumn.Name);

To

strNewID = currentDataFlowTask.Name + "_" + currentInputColumn.ID.ToString();
lineageIDs.Add(strNewID, currentInputColumn.Name);

Basically giving a unique name for the Column Name by adding the Data Flow Name plus the '_' (Underscore) as a prefix.

基本上通过添加数据流名称加上'_'(下划线)作为前缀为列名提供唯一名称。


Step 2
Since we have modified the Column names while adding to the LineageIDs collection, we need to use the modified column name, (i.e. Prefix Data Flow Name plus the '_' (Underscore)) when comparing column names in Input0_ProcessInputRow method in another script task, which is required and you have not copied above.

步骤2由于我们在添加到LineageIDs集合时修改了列名称,因此在比较另一个脚本中的Input0_ProcessInputRow方法中的列名时,我们需要使用修改后的列名称(即前缀数据流名称加上'_'(下划线))任务,这是必需的,你没有复制上面。

    string newColNum = "DataFlowTaskName_" + Row.ErrorColumn.Value.ToString();
    if (lineageIDs.ContainsKey(newColNum))
        Row.ErrorColumnName = lineageIDs[newColNum];


Note: In above code DataFlowTaskName_ is hard coded value and needs to be replaced with the Data Flow Task name in which your second script task exists, since it is not available in the Input0_ProcessInputRow method we need to hard code it.

注意:在上面的代码中,DataFlowTaskName_是硬编码值,需要替换为存在第二个脚本任务的数据流任务名称,因为它在Input0_ProcessInputRow方法中不可用,我们需要对其进行硬编码。

This is just one way of doing it, May be you can find another way of handling the duplicates, this is how I did and it worked.

这只是一种方法,可能你可以找到另一种处理重复的方法,这就是我做的和它的工作方式。


Hope this helps.

希望这可以帮助。

#1


2  

The 'duplicate key' error occurs if you have a duplicate column name in any of the data flows in the package. I had this problem before and I solved it by adding the following 2 steps.

Step 1
Make the following changes to the ProcessDataFlowTask method at both instances of lineageId.Add(...)

如果包中的任何数据流中有重复的列名,则会出现“重复键”错误。之前我遇到过这个问题,我通过添加以下两个步骤解决了这个问题。步骤1在lineageId.Add(...)的两个实例上对ProcessDataFlowTask方法进行以下更改

Remember to use currentOutputColumn.Name instead of currentInputColumn.Name at the second instance

请记住在第二个实例中使用currentOutputColumn.Name而不是currentInputColumn.Name

Change from

改变

lineageId.Add(currentInputColumn.ID, currentInputColumn.Name);

To

strNewID = currentDataFlowTask.Name + "_" + currentInputColumn.ID.ToString();
lineageIDs.Add(strNewID, currentInputColumn.Name);

Basically giving a unique name for the Column Name by adding the Data Flow Name plus the '_' (Underscore) as a prefix.

基本上通过添加数据流名称加上'_'(下划线)作为前缀为列名提供唯一名称。


Step 2
Since we have modified the Column names while adding to the LineageIDs collection, we need to use the modified column name, (i.e. Prefix Data Flow Name plus the '_' (Underscore)) when comparing column names in Input0_ProcessInputRow method in another script task, which is required and you have not copied above.

步骤2由于我们在添加到LineageIDs集合时修改了列名称,因此在比较另一个脚本中的Input0_ProcessInputRow方法中的列名时,我们需要使用修改后的列名称(即前缀数据流名称加上'_'(下划线))任务,这是必需的,你没有复制上面。

    string newColNum = "DataFlowTaskName_" + Row.ErrorColumn.Value.ToString();
    if (lineageIDs.ContainsKey(newColNum))
        Row.ErrorColumnName = lineageIDs[newColNum];


Note: In above code DataFlowTaskName_ is hard coded value and needs to be replaced with the Data Flow Task name in which your second script task exists, since it is not available in the Input0_ProcessInputRow method we need to hard code it.

注意:在上面的代码中,DataFlowTaskName_是硬编码值,需要替换为存在第二个脚本任务的数据流任务名称,因为它在Input0_ProcessInputRow方法中不可用,我们需要对其进行硬编码。

This is just one way of doing it, May be you can find another way of handling the duplicates, this is how I did and it worked.

这只是一种方法,可能你可以找到另一种处理重复的方法,这就是我做的和它的工作方式。


Hope this helps.

希望这可以帮助。