使用SSIS脚本任务读取Excel单元格值

时间:2022-03-26 20:55:08

I am trying to read an Excel file via a SSIS ScriptTask to check for certain cell values in that worksheet.

我试图通过SSIS ScriptTask读取Excel文件,以检查该工作表中的某些单元格值。

In the code example you can see that the strSQL is set to "H4:H4" to only read one cell. This cell can only have a true or false value. Since I also need to check for a certain string value in B1 I wanted to extend this version.

在代码示例中,您可以看到strSQL设置为“H4:H4”以仅读取一个单元格。此单元格只能具有true或false值。由于我还需要检查B1中的某个字符串值,我想扩展此版本。

  string filePath = "c:\\test\\testBoolean.XLSX";
  string tabName = "testSheet$";
  string strSQL = "Select * From [" + tabName + "H4:H4]";
  String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
                  + filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";";
  OleDbConnection cn = new OleDbConnection(strCn);
  int iCnt = 0;
  OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
  DataSet ds = new DataSet();
  objAdapter.Fill(ds, tabName);
  DataTable dt = ds.Tables[tabName];

  foreach (DataRow row in dt.Rows)
  {
      iCnt = iCnt + 1;
      // some processing....
  }

What I don't understand is why I get a boolean value with the above strSQL statement or with any statment containing the same row number like so:

我不明白的是为什么我用上面的strSQL语句或任何包含相同行号的statment得到一个布尔值,如下所示:

string strSQL = "Select * From [" + tabName + "F4:H4]";

Debug-Output:

调试输出:

row.ItemArray[2]    false   object {bool}

But when I set a different range like this one:

但是当我设置一个像这样的不同范围时:

string strSQL = "Select * From [" + tabName + "F1:H4]";

I loose the recognition of the bool value:

我放弃了对bool值的认识:

row.ItemArray[2]   "FALSE"  object {string}

I'd much rather like to use the bool value for other processing tasks.

我更喜欢将bool值用于其他处理任务。

How can I fix this in addition to also reading the B2 value?

除了读取B2值之外,我该如何解决这个问题呢?

2 个解决方案

#1


3  

Your connection string specified IMEX=1, which tells the driver to treat intermixed data types as text. (See the "Usage Considerations" section of the MSDN article Excel Connection Manager.)

您的连接字符串指定IMEX = 1,它告诉驱动程序将混合数据类型视为文本。 (请参阅MSDN文章Excel连接管理器的“使用注意事项”部分。)

Thus, when you specified a single row

因此,当您指定单行时

string strSQL = "Select * From [" + tabName + "F4:H4]";

there was only one possible data type for the third column, and the driver was able to correctly infer it. However, when you specified multiple rows

第三列只有一种可能的数据类型,驱动程序能够正确推断它。但是,当您指定多行时

string strSQL = "Select * From [" + tabName + "F1:H4]";

and any value in the range H1:H4 was not a bool, the driver translated all values in that column to strings.

并且H1:H4范围内的任何值都不是bool,驱动程序将该列中的所有值都转换为字符串。

Assuming that you do in fact have mixed data types in column H and only care about the values in two particular cells, the simplest solution is to query each cell individually. See Import a single Excel cell into SSIS for some ideas on how to do that.

假设您确实在H列中具有混合数据类型并且仅关注两个特定单元格中的值,最简单的解决方案是单独查询每个单元格。有关如何执行此操作的一些想法,请参阅将单个Excel单元格导入SSIS。

#2


1  

I would clone most of the code to produce two separate SELECT statements to query the two different cells you are after with separate SQL statements.

我将克隆大部分代码以生成两个单独的SELECT语句,以使用单独的SQL语句查询您所使用的两个不同单元格。

Actually I would probably go further and shred the whole script into SSIS components e.g. Execute SQL Tasks or Data Flow Tasks.

实际上我可能会更进一步,将整个脚本分成SSIS组件,例如执行SQL任务或数据流任务。

#1


3  

Your connection string specified IMEX=1, which tells the driver to treat intermixed data types as text. (See the "Usage Considerations" section of the MSDN article Excel Connection Manager.)

您的连接字符串指定IMEX = 1,它告诉驱动程序将混合数据类型视为文本。 (请参阅MSDN文章Excel连接管理器的“使用注意事项”部分。)

Thus, when you specified a single row

因此,当您指定单行时

string strSQL = "Select * From [" + tabName + "F4:H4]";

there was only one possible data type for the third column, and the driver was able to correctly infer it. However, when you specified multiple rows

第三列只有一种可能的数据类型,驱动程序能够正确推断它。但是,当您指定多行时

string strSQL = "Select * From [" + tabName + "F1:H4]";

and any value in the range H1:H4 was not a bool, the driver translated all values in that column to strings.

并且H1:H4范围内的任何值都不是bool,驱动程序将该列中的所有值都转换为字符串。

Assuming that you do in fact have mixed data types in column H and only care about the values in two particular cells, the simplest solution is to query each cell individually. See Import a single Excel cell into SSIS for some ideas on how to do that.

假设您确实在H列中具有混合数据类型并且仅关注两个特定单元格中的值,最简单的解决方案是单独查询每个单元格。有关如何执行此操作的一些想法,请参阅将单个Excel单元格导入SSIS。

#2


1  

I would clone most of the code to produce two separate SELECT statements to query the two different cells you are after with separate SQL statements.

我将克隆大部分代码以生成两个单独的SELECT语句,以使用单独的SQL语句查询您所使用的两个不同单元格。

Actually I would probably go further and shred the whole script into SSIS components e.g. Execute SQL Tasks or Data Flow Tasks.

实际上我可能会更进一步,将整个脚本分成SSIS组件,例如执行SQL任务或数据流任务。