I am using an OleDbConnection to query an Excel 2007 Spreadsheet. I want force the OleDbDataReader to use only string as the column datatype.
我正在使用OleDbConnection查询Excel 2007电子表格。我希望强制OleDbDataReader只使用字符串作为列数据类型。
The system is looking at the first 8 rows of data and inferring the data type to be Double. The problem is that on row 9 I have a string in that column and the OleDbDataReader is returning a Null value since it could not be cast to a Double.
系统正在查看前8行数据,并将数据类型推断为双行。问题是,在第9行,我在该列中有一个字符串,OleDbDataReader返回一个空值,因为它不能被转换为双值。
I have used these connection strings:
我使用了这些连接字符串:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 12.0;IMEX=1;HDR=No"
提供程序= microsoft.ac . oledb .12.0;数据源="ExcelFile.xlsx";持久化安全信息=False;扩展属性="Excel 12.0;IMEX=1;HDR=No"
Provider=Microsoft.Jet.OLEDB.4.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 8.0;HDR=No;IMEX=1"
提供程序=Microsoft.Jet.OLEDB.4.0;数据源="ExcelFile.xlsx";持久化安全信息=False;扩展属性="Excel 8.0;HDR=No;IMEX=1"
Looking at the reader.GetSchemaTable().Rows[7].ItemArray[5], it's dataType is Double.
看着reader.GetSchemaTable().Rows[7]。ItemArray[5],它的数据类型是Double。
Row 7 in this schema correlates with the specific column in Excel I am having issues with. ItemArray[5] is its DataType column
这个模式中的第7行与Excel中的特定列相关。ItemArray[5]是它的数据类型列
Is it possible to create a custom TableSchema for the reader so when accessing the ExcelFiles, I can treat all cells as text instead of letting the system attempt to infer the datatype?
是否可以为读者创建一个自定义的选项卡,以便在访问ExcelFiles时,我可以将所有单元格视为文本,而不是让系统尝试推断数据类型?
I found some good info at this page: Tips for reading Excel spreadsheets using ADO.NET
我在这个页面找到了一些好的信息:使用ADO.NET阅读Excel电子表格的技巧。
The main quirk about the ADO.NET interface is how datatypes are handled. (You'll notice I've been carefully avoiding the question of which datatypes are returned when reading the spreadsheet.) Are you ready for this? ADO.NET scans the first 8 rows of data, and based on that guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!
麻烦的主要怪癖。NET接口是处理数据类型的方式。(您会注意到,我一直在小心地避免在阅读电子表格时返回哪些数据类型的问题。)你准备好了吗?ADO。NET扫描前8行数据,并基于此猜测每个列的数据类型。然后,它试图将该列中的所有数据强制到该数据类型,在强制失败时返回NULL !
Thank you,
Keith
谢谢你,基思
Here is a reduced version of my code:
下面是我的代码的简化版本:
using (OleDbConnection connection = new OleDbConnection(BuildConnectionString(dataMapper).ToString()))
{
connection.Open();
using (OleDbCommand cmd = new OleDbCommand())
{
cmd.Connection = connection;
cmd.CommandText = SELECT * from [Sheet1$];
using (OleDbDataReader reader = cmd.ExecuteReader())
{
using (DataTable dataTable = new DataTable("TestTable"))
{
dataTable.Load(reader);
base.SourceDataSet.Tables.Add(dataTable);
}
}
}
}
4 个解决方案
#1
6
As you have discovered, OLEDB uses Jet which is limited in the manner in which it can be tweaked. If you are set on using an OleDbConnection to read from an Excel file, then you need to set the HKLM\...\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
value to zero so that the system will scan the entire resultset.
正如您所发现的,OLEDB使用Jet,它在可调整的方式上是有限的。如果您设置使用OleDbConnection从Excel文件中读取数据,那么您需要设置HKLM\…\Microsoft\Jet\ Jet 4.0\ \ \ engine \Excel\ Excel\TypeGuessRows value为0,以便系统扫描整个resultset。
That said, if you are open to using an alternative engine to read from an Excel file, you might consider trying the ExcelDataReader. It reads all columns as strings but will let you use dataReader.Getxxx methods to get typed values. Here's a sample that fills a DataSet
:
也就是说,如果您愿意使用替代引擎从Excel文件中读取数据,您可以考虑尝试ExcelDataReader。它以字符串形式读取所有列,但允许使用dataReader。Getxxx方法获取类型化值。下面是一个填充数据集的示例:
DataSet result;
const string path = @"....\Test.xlsx";
using ( var fileStream = new FileStream( path, FileMode.Open, FileAccess.Read ) )
{
using ( var excelReader = ExcelReaderFactory.CreateOpenXmlReader( fileStream ) )
{
excelReader.IsFirstRowAsColumnNames = true;
result = excelReader.AsDataSet();
}
}
#2
1
Check out the final answer on this page.
看看这一页的最后答案。
Just noticed the page you refer to says the same thing ...
注意到你所指的页面说的是同一件事……
Update:
更新:
The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it. Anything done after that has no effect; like casting the values to string in the SQL (e.g. Cstr([Column])) just results in an empty string being returned.
问题似乎在于喷气发动机本身,而不是ADO。一旦JET决定了类型,它就会坚持下去。之后做的任何事都没有效果;就像在SQL(例如Cstr([Column]))中将值转换为字符串一样,只会返回一个空字符串。
At this point (if there are no other answers) I'd opt for other methods: modifying the spreadsheet; modifying registry (not ideal since you will be messing with the settings for every other app the uses JET); Excel automation or a third party component that does not use JET.
此时(如果没有其他答案),我将选择其他方法:修改电子表格;修改注册表(这并不理想,因为您将会扰乱使用JET的其他应用程序的设置);Excel自动化或不使用JET的第三方组件。
If Automation option is to slow then maybe just use it to save the spreadsheet in a different format which is easier to handle.
如果自动化选项是慢下来,那么可以使用它将电子表格保存为另一种更容易处理的格式。
#3
1
Note for 64bit OS it is here:
64位操作系统的注意事项如下:
My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
#4
0
I have faced the same issue and determined that this is something that many people commonly experience. Here are a number of solutions that have been suggested, many of which I have attempted to implement:
我也遇到过同样的问题,并认为这是很多人都经常经历的事情。以下是一些已经提出的解决办法,其中许多是我试图执行的:
- Add the following to your connection string(Source):
- 在您的连接字符串(源)中添加以下内容:
TypeGuessRows=0;ImportMixedTypes=Text
TypeGuessRows = 0;ImportMixedTypes =文本
- Add the following to your connection string(Source, More Discussion, Even More):
- 在您的连接字符串中添加以下内容(来源,更多的讨论,甚至更多):
IMEX=1;HDR=NO;
IMEX = 1;HDR =没有;
- Edit the following registry settings, disable "TypeGuessRows", and "ImportMixedTypes" set to "Text"(Source, Not Recommended, More Documentation):
- 编辑以下注册表设置,禁用“TypeGuessRows”和“ImportMixedTypes”设置为“Text”(来源,不推荐,更多文档):
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes
Hkey_Local_Machine /软件/微软/飞机/ 4.0 /发动机/ Excel / TypeGuessRows Hkey_Local_Machine /软件/微软/飞机/ 4.0 /发动机/ Excel / ImportMixedTypes
-
Consider using an alternative library for reading the excel file:
考虑使用其他库读取excel文件:
- EPPlus
- EPPlus
- ExcelDataReader (also suggested be @Thomas)
- ExcelDataReader(也建议使用@Thomas)
- OpenXml
- OpenXml
-
Format all data in the source file as Text(at least the first 8 rows), though I understand that's typically impractical(Source, though this is relation to SSIS, but it's the same concepts)
将源文件中的所有数据格式化为文本(至少是前8行),尽管我理解这通常是不切实际的(来源,尽管这与SSIS有关,但它是相同的概念)
-
Use a Schema.ini file to define the data type before importing the file, I found this in relation to using "Jet.OleDb" directly, maybe requiring you to modifying your connection string. This may only be applicable to CSV's I have not tried this approach.(Source, Related Post)
使用一个模式。ini文件在导入文件之前定义数据类型,我在使用“Jet”时发现了这个。OleDb"直接,可能需要你修改你的连接字符串。这可能只适用于CSV,我还没有尝试过这种方法。(来源、相关文章)
None of these have worked for me(though I believe they have worked for others). I am of the opinion expressed by @Asher that there is really no good solution to this problem. In my software I simply display an error message to the user(if any required column contain empty values) instructing them to format all columns as "Text".
这些方法对我都不起作用(尽管我相信它们对其他人也起作用)。@Asher表示的观点是,这个问题真的没有好的解决办法。在我的软件中,我只是向用户显示一条错误消息(如果需要任何列包含空值),指示他们将所有列格式化为“文本”。
Honestly, I think this book is more applicable to situation. The issue, already stated multiple times is:
老实说,我觉得这本书更适合实际情况。这个问题已经多次提到:
-
"The data type at the destination is varchar but the assumed data type of "double" nullifies any data that doesn't fit."(Source)
目标的数据类型是varchar,但是假设的“double”数据类型会使任何不适合的数据无效。
-
"But the problem is actually with the OLEDBDataReader. The problem is that if it sees mostly numbers in a column, it assumes everything is a number - if a row item being read is not a number, it simply sets it to null! Ouch!"(Source)
但问题是OLEDBDataReader。问题是,如果它在一列中看到的大多是数字,那么它就假定所有的东西都是数字——如果被读取的行项不是数字,那么它就把它设置为null!哎哟!”(源)
-
"The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it."(@Asher)
“问题似乎在于喷气发动机本身,而不是ADO。一旦JET决定了类型,它就会坚持下去。
While I haven't found any of this documented in an official capacity I think that it's very clear that this is an intentional design decision and simply how the Jet Database Library works. I hesitate to call this library entirely useless because I think for many people some of these solutions do work, but so far for my project, I have come to the conclusion that this library cannot read multiple data types in a single column and is ill suited for general data retrieval.
虽然我还没有找到任何官方文档,但我认为很明显,这是一个有意的设计决策,而且仅仅是Jet数据库库的工作方式。我犹豫地调用这个库完全无用的,因为我觉得对于很多人来说这些解决方案做的一些工作,但到目前为止,我的项目,我已经得出结论,这个库不能读取多个数据类型在一个列和不适合一般的数据检索。
#1
6
As you have discovered, OLEDB uses Jet which is limited in the manner in which it can be tweaked. If you are set on using an OleDbConnection to read from an Excel file, then you need to set the HKLM\...\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
value to zero so that the system will scan the entire resultset.
正如您所发现的,OLEDB使用Jet,它在可调整的方式上是有限的。如果您设置使用OleDbConnection从Excel文件中读取数据,那么您需要设置HKLM\…\Microsoft\Jet\ Jet 4.0\ \ \ engine \Excel\ Excel\TypeGuessRows value为0,以便系统扫描整个resultset。
That said, if you are open to using an alternative engine to read from an Excel file, you might consider trying the ExcelDataReader. It reads all columns as strings but will let you use dataReader.Getxxx methods to get typed values. Here's a sample that fills a DataSet
:
也就是说,如果您愿意使用替代引擎从Excel文件中读取数据,您可以考虑尝试ExcelDataReader。它以字符串形式读取所有列,但允许使用dataReader。Getxxx方法获取类型化值。下面是一个填充数据集的示例:
DataSet result;
const string path = @"....\Test.xlsx";
using ( var fileStream = new FileStream( path, FileMode.Open, FileAccess.Read ) )
{
using ( var excelReader = ExcelReaderFactory.CreateOpenXmlReader( fileStream ) )
{
excelReader.IsFirstRowAsColumnNames = true;
result = excelReader.AsDataSet();
}
}
#2
1
Check out the final answer on this page.
看看这一页的最后答案。
Just noticed the page you refer to says the same thing ...
注意到你所指的页面说的是同一件事……
Update:
更新:
The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it. Anything done after that has no effect; like casting the values to string in the SQL (e.g. Cstr([Column])) just results in an empty string being returned.
问题似乎在于喷气发动机本身,而不是ADO。一旦JET决定了类型,它就会坚持下去。之后做的任何事都没有效果;就像在SQL(例如Cstr([Column]))中将值转换为字符串一样,只会返回一个空字符串。
At this point (if there are no other answers) I'd opt for other methods: modifying the spreadsheet; modifying registry (not ideal since you will be messing with the settings for every other app the uses JET); Excel automation or a third party component that does not use JET.
此时(如果没有其他答案),我将选择其他方法:修改电子表格;修改注册表(这并不理想,因为您将会扰乱使用JET的其他应用程序的设置);Excel自动化或不使用JET的第三方组件。
If Automation option is to slow then maybe just use it to save the spreadsheet in a different format which is easier to handle.
如果自动化选项是慢下来,那么可以使用它将电子表格保存为另一种更容易处理的格式。
#3
1
Note for 64bit OS it is here:
64位操作系统的注意事项如下:
My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
#4
0
I have faced the same issue and determined that this is something that many people commonly experience. Here are a number of solutions that have been suggested, many of which I have attempted to implement:
我也遇到过同样的问题,并认为这是很多人都经常经历的事情。以下是一些已经提出的解决办法,其中许多是我试图执行的:
- Add the following to your connection string(Source):
- 在您的连接字符串(源)中添加以下内容:
TypeGuessRows=0;ImportMixedTypes=Text
TypeGuessRows = 0;ImportMixedTypes =文本
- Add the following to your connection string(Source, More Discussion, Even More):
- 在您的连接字符串中添加以下内容(来源,更多的讨论,甚至更多):
IMEX=1;HDR=NO;
IMEX = 1;HDR =没有;
- Edit the following registry settings, disable "TypeGuessRows", and "ImportMixedTypes" set to "Text"(Source, Not Recommended, More Documentation):
- 编辑以下注册表设置,禁用“TypeGuessRows”和“ImportMixedTypes”设置为“Text”(来源,不推荐,更多文档):
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes
Hkey_Local_Machine /软件/微软/飞机/ 4.0 /发动机/ Excel / TypeGuessRows Hkey_Local_Machine /软件/微软/飞机/ 4.0 /发动机/ Excel / ImportMixedTypes
-
Consider using an alternative library for reading the excel file:
考虑使用其他库读取excel文件:
- EPPlus
- EPPlus
- ExcelDataReader (also suggested be @Thomas)
- ExcelDataReader(也建议使用@Thomas)
- OpenXml
- OpenXml
-
Format all data in the source file as Text(at least the first 8 rows), though I understand that's typically impractical(Source, though this is relation to SSIS, but it's the same concepts)
将源文件中的所有数据格式化为文本(至少是前8行),尽管我理解这通常是不切实际的(来源,尽管这与SSIS有关,但它是相同的概念)
-
Use a Schema.ini file to define the data type before importing the file, I found this in relation to using "Jet.OleDb" directly, maybe requiring you to modifying your connection string. This may only be applicable to CSV's I have not tried this approach.(Source, Related Post)
使用一个模式。ini文件在导入文件之前定义数据类型,我在使用“Jet”时发现了这个。OleDb"直接,可能需要你修改你的连接字符串。这可能只适用于CSV,我还没有尝试过这种方法。(来源、相关文章)
None of these have worked for me(though I believe they have worked for others). I am of the opinion expressed by @Asher that there is really no good solution to this problem. In my software I simply display an error message to the user(if any required column contain empty values) instructing them to format all columns as "Text".
这些方法对我都不起作用(尽管我相信它们对其他人也起作用)。@Asher表示的观点是,这个问题真的没有好的解决办法。在我的软件中,我只是向用户显示一条错误消息(如果需要任何列包含空值),指示他们将所有列格式化为“文本”。
Honestly, I think this book is more applicable to situation. The issue, already stated multiple times is:
老实说,我觉得这本书更适合实际情况。这个问题已经多次提到:
-
"The data type at the destination is varchar but the assumed data type of "double" nullifies any data that doesn't fit."(Source)
目标的数据类型是varchar,但是假设的“double”数据类型会使任何不适合的数据无效。
-
"But the problem is actually with the OLEDBDataReader. The problem is that if it sees mostly numbers in a column, it assumes everything is a number - if a row item being read is not a number, it simply sets it to null! Ouch!"(Source)
但问题是OLEDBDataReader。问题是,如果它在一列中看到的大多是数字,那么它就假定所有的东西都是数字——如果被读取的行项不是数字,那么它就把它设置为null!哎哟!”(源)
-
"The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it."(@Asher)
“问题似乎在于喷气发动机本身,而不是ADO。一旦JET决定了类型,它就会坚持下去。
While I haven't found any of this documented in an official capacity I think that it's very clear that this is an intentional design decision and simply how the Jet Database Library works. I hesitate to call this library entirely useless because I think for many people some of these solutions do work, but so far for my project, I have come to the conclusion that this library cannot read multiple data types in a single column and is ill suited for general data retrieval.
虽然我还没有找到任何官方文档,但我认为很明显,这是一个有意的设计决策,而且仅仅是Jet数据库库的工作方式。我犹豫地调用这个库完全无用的,因为我觉得对于很多人来说这些解决方案做的一些工作,但到目前为止,我的项目,我已经得出结论,这个库不能读取多个数据类型在一个列和不适合一般的数据检索。