SSIS:如何过滤具有1个或更多空列的行并确定哪些列?

时间:2021-12-09 23:54:37

I hope someone can help me with this. I am doing an extraction from excel and load into ole database using SSIS. Before, it enters the database, I have to filter out rows with invalid and null columns and store these rows with errors in another database for errors.

我希望有人可以帮助我。我正在从excel中提取并使用SSIS加载到ole数据库中。之前,它进入数据库,我必须过滤掉包含无效和空列的行,并将这些行与错误存储在另一个数据库中。

This is the data in my TransactionRecord Excel:

这是我的TransactionRecord Excel中的数据:

CustID  TransactionDate TransactionTime AmountSpent
123     1/2/2011        10:30           $1 
(null)  3/4/2012       (null)           $8 
789     3/4/2011        12:00           $7 
698     (null)          11:23           $5 

*(null) represents empty fields in excel.

*(null)表示excel中的空字段。

Currently, this is what I have done in SSIS

目前,这就是我在SSIS中所做的

TransactionRecord.xlsx ---> Conditional Split --(Case 1:filter rows with null)--> ErrorDB
                                                                      --(default output)---> TransactionDB

TransactionRecord.xlsx --->条件分割 - (案例1:过滤行为空) - > ErrorDB - (默认输出)---> TransactionDB

I am only able to filter out rows with null values with this condition:
ISNULL(CustID) || ISNULL(TransactionDate) || ISNULL(TransactionTime) || ISNULL(AmountSpent)

我只能使用以下条件过滤掉具有空值的行:ISNULL(CustID)|| ISNULL(TransactionDate)|| ISNULL(TransactionTime)|| ISNULL(AmountSpent)

However, with this method, I am unable to identify which are the columns with null values. I am thinking of having a ErrorMsg column in ErrorDB which will state which are the columns that needed to be changed.

但是,使用此方法,我无法确定哪些列具有空值。我想在ErrorDB中有一个ErrorMsg列,它将说明哪些列需要更改。

ErrorDB:

CustID  TransactionDate   TransactionTime   AmountSpent   ErrorMsg
null    3/4/2012          null              $8            CustIDNull, TimeNull
698     null              11:23             $5            DateNull

I have tried to used "Derived Column" transformation to add a new ErrorMsg column, however, I am unable to pinpoint which columns have the errors.

我曾尝试使用“派生列”转换来添加新的ErrorMsg列,但是,我无法确定哪些列有错误。

Is there any better way of extracting out these error columns and store them in a database?

有没有更好的方法来提取这些错误列并将它们存储在数据库中?

(can't post image as I am new to *, thus do not have enough reputation points)

(无法发布图片,因为我是*的新手,因此没有足够的声誉点)

3 个解决方案

#1


0  

If you want your ErrorMsg column to contain the first column error found then add the following to your derived column:

如果希望ErrorMsg列包含找到的第一个列错误,请将以下内容添加到派生列:

ISNULL(CustID) ? "CustID Error" : 
  ISNULL(TransactionDate) ? "TransactionDate Error" :
    ISNULL(TransactionTime) ? "TransactionTime Error" : 
      ISNULL(AmountSpent) ? "AmountSpent Error" : "Unknown Error" 

If you want to have a list of column errors then use something like this:

如果您想要列错误列表,请使用以下内容:

LTRIM((ISNULL(CustID) ? "CustID " : "") +
(ISNULL(TransactionDate) ? "TransactionDate " : "") +
(ISNULL(TransactionTime) ? "TransactionTime " : "") +
(ISNULL(AmountSpent) ? "AmountSpent" : ""))

Alternatively send your errors through a script component transformation, you could then set it write out one row for every error column.

或者通过脚本组件转换发送错误,然后可以设置为每个错误列写出一行。

#2


0  

  1. Use a multicast transformation to break off from your main dataflow
  2. 使用多播转换来断开主数据流

  3. Use an asynchronous script component to loop through all columns and write a new row for each row/column that has a null value
  4. 使用异步脚本组件循环遍历所有列,并为具有空值的每个行/列写入新行

  5. Write the resulting dataflow to your ErrorDB
  6. 将结果数据流写入ErrorDB

Code for this script would go something like:

此脚本的代码将类似于:

' Loop through all columns and create field elements
For Each column As IDTSInputColumn100 In Me.ComponentMetaData.InputCollection(0).InputColumnCollection

'Get Column Name
sColumnName = column.Name

' Get column value, will catch if null 
Try

    'Clean up column name (VB/SSIS will do this when processing columns so we need to ask for the value using cleaned up column name)
    sColumnNameClean = column.Name.Trim().Replace(" ", "").Replace(".", "").Replace(":", "").Replace("-", "")

    'Get column value
    sColValue = rowType.GetProperty(sColumnNameClean).GetValue(Row, Nothing).ToString()


Catch

    'Add reference to function to create row here, referencing sColumnName

End Try


Next

#3


0  

I used your data to create an Excel 2010 file.

我使用您的数据创建Excel 2010文件。

Then I created two tables:

然后我创建了两个表:

CREATE TABLE [123XLSX] (
    [CustID] INT  NOT NULL,
    [TransactionDate] datetime NOT NULL,
    [TransactionTime] datetime NOT NULL,
    [AmountSpent] money NOT NULL
)

CREATE TABLE [123XLSXError] (
    [CustID] VARCHAR(50) NULL,
    [TransactionDate]  VARCHAR(50) NULL,
    [TransactionTime]  VARCHAR(50) NULL,
    [AmountSpent]  VARCHAR(50) NULL,
    [ErrorCode] int,
    [ErrorColumn] int
)

Now, connect your Excel source to OleDB Dest (Table 123XLSX]. From this destination, send the Error Output to another OLEDB Dest (Table 123XLSXError).

现在,将您的Excel源连接到OleDB Dest(表123XLSX)。从此目标,将错误输出发送到另一个OLEDB Dest(表123XLSXError)。

The result:

  SELECT * FROM [dbo].[123XLSX]
  SELECT * FROM [dbo].[123XLSXError]

CustID TransactionDate         TransactionTime         AmountSpent
------ ----------------------- ----------------------- ------------
123    2011-01-02 00:00:00.000 1899-12-30 10:30:00.000 1.00
789    2011-03-04 00:00:00.000 1899-12-30 12:00:00.000 7.00

CustID TransactionDate          TransactionTime         AmountSpent ErrorCode  ErrorColumn
------------------------------  -----------             ----------- ----------
NULL   2012-03-04 00:00:00      NULL                     8           -1071607683 41
698    NULL                     1899-12-30 11:23:00      5           -1071607683 42

Although this is not the exact solution, it gives you the rows that errored along with their field values.

虽然这不是确切的解决方案,但它会为您提供与其字段值一起出错的行。

If your want to further polish this result, here are a few good examples. If you need help, please let us know.

如果你想进一步改进这个结果,这里有一些很好的例子。如果您需要帮助,请告诉我们。

https://naseermuhammed.wordpress.com/tips-tricks/getting-error-column-name-in-ssis/

http://dougbert.com/blog/post/Adding-the-error-column-name-to-an-error-output.aspx

#1


0  

If you want your ErrorMsg column to contain the first column error found then add the following to your derived column:

如果希望ErrorMsg列包含找到的第一个列错误,请将以下内容添加到派生列:

ISNULL(CustID) ? "CustID Error" : 
  ISNULL(TransactionDate) ? "TransactionDate Error" :
    ISNULL(TransactionTime) ? "TransactionTime Error" : 
      ISNULL(AmountSpent) ? "AmountSpent Error" : "Unknown Error" 

If you want to have a list of column errors then use something like this:

如果您想要列错误列表,请使用以下内容:

LTRIM((ISNULL(CustID) ? "CustID " : "") +
(ISNULL(TransactionDate) ? "TransactionDate " : "") +
(ISNULL(TransactionTime) ? "TransactionTime " : "") +
(ISNULL(AmountSpent) ? "AmountSpent" : ""))

Alternatively send your errors through a script component transformation, you could then set it write out one row for every error column.

或者通过脚本组件转换发送错误,然后可以设置为每个错误列写出一行。

#2


0  

  1. Use a multicast transformation to break off from your main dataflow
  2. 使用多播转换来断开主数据流

  3. Use an asynchronous script component to loop through all columns and write a new row for each row/column that has a null value
  4. 使用异步脚本组件循环遍历所有列,并为具有空值的每个行/列写入新行

  5. Write the resulting dataflow to your ErrorDB
  6. 将结果数据流写入ErrorDB

Code for this script would go something like:

此脚本的代码将类似于:

' Loop through all columns and create field elements
For Each column As IDTSInputColumn100 In Me.ComponentMetaData.InputCollection(0).InputColumnCollection

'Get Column Name
sColumnName = column.Name

' Get column value, will catch if null 
Try

    'Clean up column name (VB/SSIS will do this when processing columns so we need to ask for the value using cleaned up column name)
    sColumnNameClean = column.Name.Trim().Replace(" ", "").Replace(".", "").Replace(":", "").Replace("-", "")

    'Get column value
    sColValue = rowType.GetProperty(sColumnNameClean).GetValue(Row, Nothing).ToString()


Catch

    'Add reference to function to create row here, referencing sColumnName

End Try


Next

#3


0  

I used your data to create an Excel 2010 file.

我使用您的数据创建Excel 2010文件。

Then I created two tables:

然后我创建了两个表:

CREATE TABLE [123XLSX] (
    [CustID] INT  NOT NULL,
    [TransactionDate] datetime NOT NULL,
    [TransactionTime] datetime NOT NULL,
    [AmountSpent] money NOT NULL
)

CREATE TABLE [123XLSXError] (
    [CustID] VARCHAR(50) NULL,
    [TransactionDate]  VARCHAR(50) NULL,
    [TransactionTime]  VARCHAR(50) NULL,
    [AmountSpent]  VARCHAR(50) NULL,
    [ErrorCode] int,
    [ErrorColumn] int
)

Now, connect your Excel source to OleDB Dest (Table 123XLSX]. From this destination, send the Error Output to another OLEDB Dest (Table 123XLSXError).

现在,将您的Excel源连接到OleDB Dest(表123XLSX)。从此目标,将错误输出发送到另一个OLEDB Dest(表123XLSXError)。

The result:

  SELECT * FROM [dbo].[123XLSX]
  SELECT * FROM [dbo].[123XLSXError]

CustID TransactionDate         TransactionTime         AmountSpent
------ ----------------------- ----------------------- ------------
123    2011-01-02 00:00:00.000 1899-12-30 10:30:00.000 1.00
789    2011-03-04 00:00:00.000 1899-12-30 12:00:00.000 7.00

CustID TransactionDate          TransactionTime         AmountSpent ErrorCode  ErrorColumn
------------------------------  -----------             ----------- ----------
NULL   2012-03-04 00:00:00      NULL                     8           -1071607683 41
698    NULL                     1899-12-30 11:23:00      5           -1071607683 42

Although this is not the exact solution, it gives you the rows that errored along with their field values.

虽然这不是确切的解决方案,但它会为您提供与其字段值一起出错的行。

If your want to further polish this result, here are a few good examples. If you need help, please let us know.

如果你想进一步改进这个结果,这里有一些很好的例子。如果您需要帮助,请告诉我们。

https://naseermuhammed.wordpress.com/tips-tricks/getting-error-column-name-in-ssis/

http://dougbert.com/blog/post/Adding-the-error-column-name-to-an-error-output.aspx