I have an Excel 2007 workbook that contains tables of data that I'm importing into DataTable
objects using ADO.NET.
我有一个Excel 2007工作簿,其中包含使用ADO.NET将数据导入到DataTable对象中的数据表。
Through some experimentation, I've managed to find two different ways to indicate that a cell should be treated as "null" by ADO.NET:
通过一些实验,我找到了两种不同的方法来表明一个单元格应该被ADO.NET视为“null”:
- The cell is completely blank.
- 单元格是完全空白的。
- The cell contains
#N/A
. - 单元格包含# N / A。
Unfortunately, both of these are problematic:
不幸的是,这两者都有问题:
-
Most of my columns of data in Excel are generated via formulas, but it's not possible in Excel to generate a formula that results in a completely blank cell. And only a completely blank cell will be considered null (an empty string will not work).
我在Excel中的大多数数据列都是通过公式生成的,但是在Excel中不可能生成一个公式,从而导致一个完全空白的单元格。只有一个完全空白的单元格将被认为是null(空字符串不会起作用)。
-
Any formula that evaluates to
#N/A
(either due to an actual lookup error or because theNA()
function was used) will be considered null. This seemed like the ideal solution until I discovered that the Excel workbook must be open for this to work. As soon as you close the workbook, OLEDB suddenly starts seeing all those#N/A
s as strings. This causes exceptions like the following to be thrown when filling the DataTable:任何计算为#N/A的公式(或由于实际查找错误,或由于使用了NA()函数))将被认为是空的。这似乎是理想的解决方案,直到我发现Excel工作簿必须对其开放才能工作。当您关闭工作簿时,OLEDB突然开始看到所有这些#N/As作为字符串。这将导致在填充DataTable时抛出以下异常:
Input string was not in a correct format. Couldn't store <#N/A> in Value Column. Expected type is Int32.
输入字符串的格式不正确。不能在Value列中存储<#N/A>。预期的类型是Int32。
Question: How can I indicate a null value via an Excel formula without having to have the workbook open when I fill the DataTable
? Or what can be done to make #N/A
values be considered null even when the workbook is closed?
问题:当我填充DataTable时,如何通过Excel公式表示空值,而不必打开工作簿?或者,即使在工作簿关闭时,可以做什么来使#N/A值被认为是null ?
In case it's important, my connection string is built using the following method:
如果重要的话,我的连接字符串是用以下方法构建的:
var builder = new OleDbConnectionStringBuilder
{
Provider = "Microsoft.ACE.OLEDB.12.0",
DataSource = _workbookPath
};
builder.Add("Extended Properties", "Excel 12.0 Xml;HDR=Yes;IMEX=0");
return builder.ConnectionString;
(_workbookPath
is the full path to the workbook).
(_workbookPath是工作簿的完整路径)。
I've tried both IMEX=0
and IMEX=1
but it makes no difference.
我试过IMEX=0和IMEX=1,但是没有区别。
1 个解决方案
#1
6
You're hitting the brickwall that many very frustrated users of Excel are experiencing. Unfortunately Excel as a company tool is widespread and seems quite robust, unfortunately because each cell/column/row has a variant data type it makes it a nightmare to handle with other tools such as MySQL, SQL Server, R, RapidMiner, SPSS and the list goes on. It seems that Excel 2007/2010 is not very well supported and even more so when taking 32/64 bit versions into account, which is scandalous in this day and age.
你正在敲打着许多非常沮丧的Excel用户正在经历的砖墙。不幸的是,作为公司工具的Excel非常广泛,并且看起来相当健壮,不幸的是,每个单元格/列/行都有一个不同的数据类型,这使得使用其他工具(如MySQL、SQL Server、R、RapidMiner、SPSS和列表)处理它成为一场噩梦。似乎Excel 2007/2010并没有得到很好的支持,在考虑到32/64位版本时更是如此,这在当今时代是很不光彩的。
The main problem is that when ACE/Jet access each field in Excel they use a registry setting 'TypeGuessRows' to determine how many rows to use to assess the datatype. The default for "Rows to Scan" is 8 rows. The registry setting 'TypeGuessRows' can specify an integer value from one (1) to sixteen (16) rows, or you can specify zero (0) to scan all existing rows. If you can't change the registry setting (such as in 90% of office environments) it makes life difficult as the rows to guess are limited to the first 8.
主要的问题是,当ACE/Jet访问Excel中的每个字段时,它们使用注册表设置“TypeGuessRows”来确定要使用多少行来评估数据类型。“要扫描的行”的默认值是8行。“TypeGuessRows”的注册表设置可以指定一个整数值,从1(1)到16(16)行,或者您可以指定0(0)来扫描所有现有的行。如果您不能更改注册表设置(比如90%的办公室环境),那么就会使您的生活变得困难,因为要猜测的行仅限于前8个。
For example, without the registry change If the first occurrence of #N/A is within the first 8 rows then IMEX = 1 will return the error as a string "#N/A". If IMEX = 0 then an #N/A will return 'Null'.
例如,如果#N/A的第一次出现在前8行中,那么IMEX = 1将以字符串#N/A的形式返回错误。如果IMEX = 0,则#N/A将返回'Null'。
If the first occurrence of #N/A is beyond the first 8 rows then both IMEX = 0 & IMEX = 1 both return 'Null' (assuming required data type is numeric).
如果#N/A的第一次出现超过前8行,那么IMEX = 0和IMEX = 1都返回'Null'(假设所需的数据类型是数值)。
With the registry change (TypeGuessRows = 0) then all should be fine.
如果注册表更改(TypeGuessRows = 0),那么一切都应该没问题。
Perhaps there are 4 options:
也许有四个选择:
-
Change the registry setting TypeGuessRows = 0
更改注册表设置TypeGuessRows = 0
-
List all possible type variations in the first 8 rows as 'dummy data' (eg memo fields/nchar(max)/ errors #N/A etc)
在前8行中列出所有可能的类型变体,如“伪数据”(如memo字段/nchar(max)/ error #N/A等)
-
Correct ALL data type anomalies in Excel
在Excel中校正所有数据类型异常。
-
Don't use Excel - Seriously worth considering!
不要使用Excel——真的值得考虑!
Edit: Just to put the boot in :) another 2 things that really annoy me are; if the first field on a sheet is blank over the first 8 rows and you can't edit the registry setting then the whole sheet is returned as blank (Many fun conversations telling managers they're fools for merging cells!). Also, if in Excel 2007/2010 you have a department return a sheet with >255 columns/fields then you have huge problems if you need non-contiguous import (eg key in col 1 and data in cols 255+)
编辑:只是把靴子放进去:)还有2件让我很恼火的事;如果表单上的第一个字段在前8行中是空白的,并且您不能编辑注册表设置,那么整个表就会被作为空白返回(许多有趣的对话告诉管理人员他们是合并单元的傻瓜!)此外,如果在Excel 2007/2010中有一个部门返回一个包含>255列/字段的表,那么如果您需要非连续导入(如col 1中的键和cols 255+中的数据),那么您将面临巨大的问题。
#1
6
You're hitting the brickwall that many very frustrated users of Excel are experiencing. Unfortunately Excel as a company tool is widespread and seems quite robust, unfortunately because each cell/column/row has a variant data type it makes it a nightmare to handle with other tools such as MySQL, SQL Server, R, RapidMiner, SPSS and the list goes on. It seems that Excel 2007/2010 is not very well supported and even more so when taking 32/64 bit versions into account, which is scandalous in this day and age.
你正在敲打着许多非常沮丧的Excel用户正在经历的砖墙。不幸的是,作为公司工具的Excel非常广泛,并且看起来相当健壮,不幸的是,每个单元格/列/行都有一个不同的数据类型,这使得使用其他工具(如MySQL、SQL Server、R、RapidMiner、SPSS和列表)处理它成为一场噩梦。似乎Excel 2007/2010并没有得到很好的支持,在考虑到32/64位版本时更是如此,这在当今时代是很不光彩的。
The main problem is that when ACE/Jet access each field in Excel they use a registry setting 'TypeGuessRows' to determine how many rows to use to assess the datatype. The default for "Rows to Scan" is 8 rows. The registry setting 'TypeGuessRows' can specify an integer value from one (1) to sixteen (16) rows, or you can specify zero (0) to scan all existing rows. If you can't change the registry setting (such as in 90% of office environments) it makes life difficult as the rows to guess are limited to the first 8.
主要的问题是,当ACE/Jet访问Excel中的每个字段时,它们使用注册表设置“TypeGuessRows”来确定要使用多少行来评估数据类型。“要扫描的行”的默认值是8行。“TypeGuessRows”的注册表设置可以指定一个整数值,从1(1)到16(16)行,或者您可以指定0(0)来扫描所有现有的行。如果您不能更改注册表设置(比如90%的办公室环境),那么就会使您的生活变得困难,因为要猜测的行仅限于前8个。
For example, without the registry change If the first occurrence of #N/A is within the first 8 rows then IMEX = 1 will return the error as a string "#N/A". If IMEX = 0 then an #N/A will return 'Null'.
例如,如果#N/A的第一次出现在前8行中,那么IMEX = 1将以字符串#N/A的形式返回错误。如果IMEX = 0,则#N/A将返回'Null'。
If the first occurrence of #N/A is beyond the first 8 rows then both IMEX = 0 & IMEX = 1 both return 'Null' (assuming required data type is numeric).
如果#N/A的第一次出现超过前8行,那么IMEX = 0和IMEX = 1都返回'Null'(假设所需的数据类型是数值)。
With the registry change (TypeGuessRows = 0) then all should be fine.
如果注册表更改(TypeGuessRows = 0),那么一切都应该没问题。
Perhaps there are 4 options:
也许有四个选择:
-
Change the registry setting TypeGuessRows = 0
更改注册表设置TypeGuessRows = 0
-
List all possible type variations in the first 8 rows as 'dummy data' (eg memo fields/nchar(max)/ errors #N/A etc)
在前8行中列出所有可能的类型变体,如“伪数据”(如memo字段/nchar(max)/ error #N/A等)
-
Correct ALL data type anomalies in Excel
在Excel中校正所有数据类型异常。
-
Don't use Excel - Seriously worth considering!
不要使用Excel——真的值得考虑!
Edit: Just to put the boot in :) another 2 things that really annoy me are; if the first field on a sheet is blank over the first 8 rows and you can't edit the registry setting then the whole sheet is returned as blank (Many fun conversations telling managers they're fools for merging cells!). Also, if in Excel 2007/2010 you have a department return a sheet with >255 columns/fields then you have huge problems if you need non-contiguous import (eg key in col 1 and data in cols 255+)
编辑:只是把靴子放进去:)还有2件让我很恼火的事;如果表单上的第一个字段在前8行中是空白的,并且您不能编辑注册表设置,那么整个表就会被作为空白返回(许多有趣的对话告诉管理人员他们是合并单元的傻瓜!)此外,如果在Excel 2007/2010中有一个部门返回一个包含>255列/字段的表,那么如果您需要非连续导入(如col 1中的键和cols 255+中的数据),那么您将面临巨大的问题。