通过OledbCommand在Excel-sheet的第一列中检索非空行

时间:2021-05-06 09:20:34

my requirement is to read all rows of an excel-sheet in first column that are not empty, are numeric and have a length between 15 and 20.

我的要求是读取精练表的所有行,这些行不是空的,是数字的,长度在15到20之间。

For example:

例如:

358218033354974
359473035499561
358218036156129
354022038366247
358218032490035
359473030516492
353210040325399

This column might have a header that is not numeric and there might be empty rows. This would cause an exception when importing them via SqlBulkCopy. So i want to prefilter the correct rows via OleDbCommand.

这个列可能有一个不是数字的头,并且可能有空行。这将导致在通过SqlBulkCopy导入它们时出现异常。因此,我希望通过OleDbCommand预过滤正确的行。

What i have tried so far is following(C# is also appreciated):

我到目前为止所做的尝试如下(c#也很感谢):

Using connection As New OleDbConnection(sExcelConnectionString)
   connection.Open()
   Dim schemaTable As DataTable = _
       connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, _
       New Object() {Nothing, Nothing, Nothing, "TABLE"})
   For Each row As DataRow In schemaTable.Rows
      'loop every Worksheet
      Dim OleDbCmd As OleDbCommand = New OleDbCommand("SELECT * FROM [" & row("TABLE_NAME").ToString & "]", connection)
      Dim dr As OleDbDataReader = OleDbCmd.ExecuteReader()
      Dim bulkCopy As SqlBulkCopy = New SqlBulkCopy(sSqlConnectionString)
      bulkCopy.DestinationTableName = destTable
      bulkCopy.WriteToServer(dr)
   Next
End Using

But this throws an exception if there are empty rows or the format of the value is incorrect. So my question is:

但如果存在空行或值的格式不正确,则会抛出异常。我的问题是:

Q: How to restrict the rows of OleDbCommand to:

问:如何将OleDbCommand的行限制为:

  • get only first column of every worksheet
  • 只获取每个工作表的第一列
  • all values that are numeric
  • 所有值都是数值
  • skip empty values
  • 跳过空值
  • ideally only the values with a length between 15 and 20
  • 理想情况下,只有长度在15到20之间的值

Edit: if somebody could only show me a way how to skip the empty rows, i would be happy. Or do i have to to select the whole datatable? I hoped it would be possible to do that only with one query because of performance reasons.

编辑:如果有人能告诉我如何跳过空行,我会很高兴。还是必须选择整个datatable?我希望由于性能的原因,只有一个查询可以做到这一点。

Thank you in advance.

提前谢谢你。

2 个解决方案

#1


1  

This is my solution, maybe it'll help somebody in some way:

这是我的解决方案,也许它会在某些方面帮助某些人:

Using connection As New OleDbConnection(sExcelConnectionString)
    connection.Open()
    Dim schemaTable As DataTable = _
        connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, _
        New Object() {Nothing, Nothing, Nothing, "TABLE"})
    Dim schemaColTable As DataTable = _
        connection.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _
        New Object() {Nothing, Nothing, Nothing, Nothing})
    Dim firstWorkSheet As String = schemaTable.Rows(0)("TABLE_NAME").ToString
    Dim firstColumnName As String = schemaColTable.Rows(0)("COLUMN_NAME").ToString
    Dim OleDbSQL As String = String.Format( _
        "SELECT TRIM([{1}]) AS IMEI " & _
        "FROM [{0}] " & _
        "WHERE LEN(TRIM([{1}])) BETWEEN 10 AND 15 " & _
        "ORDER BY [{1}]", firstWorkSheet, firstColumnName)
    Dim OleDbCmd As OleDbCommand = New OleDbCommand(OleDbSQL, connection)
    Using bulkCopy As New SqlBulkCopy(sSqlConnectionString)
         bulkCopy.DestinationTableName = destTable
         bulkCopy.WriteToServer(OleDbCmd.ExecuteReader)
    End Using
End Using

Convert to C#

转换为c#

#2


0  

I do not see why you should not use SQL:

我不明白为什么不应该使用SQL:

 "SELECT F1 from [Sheet1$] WHERE Len(F1)>14 AND Len(F1)<21 AND IsNumeric(F1)"

When HDR is set to NO in the connection string, fields names are assigned as F1, F2 etc, numbering from the first column of the selection. It is also possible to use [Sheet1$A:A] if you are sure that there is data in column A. Note that both named ranges and sheets are returned in the table schema.

当连接字符串中的HDR设置为NO时,字段名被分配为F1、F2等,从选择的第一列开始编号。如果您确定在A列中有数据,也可以使用[Sheet1$A:A]。注意,在表模式中返回了已命名的范围和表。

#1


1  

This is my solution, maybe it'll help somebody in some way:

这是我的解决方案,也许它会在某些方面帮助某些人:

Using connection As New OleDbConnection(sExcelConnectionString)
    connection.Open()
    Dim schemaTable As DataTable = _
        connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, _
        New Object() {Nothing, Nothing, Nothing, "TABLE"})
    Dim schemaColTable As DataTable = _
        connection.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _
        New Object() {Nothing, Nothing, Nothing, Nothing})
    Dim firstWorkSheet As String = schemaTable.Rows(0)("TABLE_NAME").ToString
    Dim firstColumnName As String = schemaColTable.Rows(0)("COLUMN_NAME").ToString
    Dim OleDbSQL As String = String.Format( _
        "SELECT TRIM([{1}]) AS IMEI " & _
        "FROM [{0}] " & _
        "WHERE LEN(TRIM([{1}])) BETWEEN 10 AND 15 " & _
        "ORDER BY [{1}]", firstWorkSheet, firstColumnName)
    Dim OleDbCmd As OleDbCommand = New OleDbCommand(OleDbSQL, connection)
    Using bulkCopy As New SqlBulkCopy(sSqlConnectionString)
         bulkCopy.DestinationTableName = destTable
         bulkCopy.WriteToServer(OleDbCmd.ExecuteReader)
    End Using
End Using

Convert to C#

转换为c#

#2


0  

I do not see why you should not use SQL:

我不明白为什么不应该使用SQL:

 "SELECT F1 from [Sheet1$] WHERE Len(F1)>14 AND Len(F1)<21 AND IsNumeric(F1)"

When HDR is set to NO in the connection string, fields names are assigned as F1, F2 etc, numbering from the first column of the selection. It is also possible to use [Sheet1$A:A] if you are sure that there is data in column A. Note that both named ranges and sheets are returned in the table schema.

当连接字符串中的HDR设置为NO时,字段名被分配为F1、F2等,从选择的第一列开始编号。如果您确定在A列中有数据,也可以使用[Sheet1$A:A]。注意,在表模式中返回了已命名的范围和表。