如何将数据的平面文件转换为可查询数据源

时间:2021-02-11 00:27:03

I generate files, lets call them .dwrf files, which contain a significant amount of data. Currently we export those to .CSV and the resulting files are large (2GB+). I would like to cut out the export process and make the contents of a .dwrf file queryable directly from Excel or other applications.

我生成文件,我们称它们为.dwrf文件,其中包含大量数据。目前我们将这些导出到. csv中,结果文件很大(2GB+)。我想删除导出过程,使.dwrf文件的内容可以直接从Excel或其他应用程序查询。

What I would like to do is write a utility/service - lets call it dwrfMiner - to extract data from the file and pass it on as a datasource and link dwrfMiner to .dwrf files in some way so that Excel recognises it as an external data source.

我想做的是编写一个实用程序/服务——我们称之为dwrfMiner——从文件中提取数据并作为数据源传递给它,并以某种方式将dwrfMiner与.dwrf文件连接起来,这样Excel就可以将它识别为外部数据源。

Any ideas?

什么好主意吗?

5 个解决方案

#1


3  

While writing an ODBC driver for this is probably overkill, if the format of the files you are working with is known in advance and isn't too hard to translate (it sounds like not considering you are already creating CSVs) then using an ODBC DSN sounds like your best bet.

虽然为此编写一个ODBC驱动程序可能有些过头了,但是如果您正在使用的文件的格式是预先知道的,并且翻译起来也不是很难(听起来好像不考虑您已经在创建csv),那么使用ODBC DSN听起来是最好的选择。

There are a nice selection of ODBC drivers already built in to Windows (.txt, .csv, .mdb, .xl*, .dbf, Paradox .db, etc etc) and you can obtain other drivers from the web for a lot of common formats.

有一个很好的选择ODBC驱动程序已经内置到Windows(。txt、.csv、.mdb、.xl*、.dbf、Paradox .db等),您可以从web上获得许多常见格式的其他驱动程序。

If the size of the existing format you're exporting to is too onerous (CSV) then the logical point to start is a transformation of your data to something more space-conscious that has ODBC support.

如果您要导出的现有格式的大小太麻烦了(CSV),那么逻辑点应该是将数据转换为更有空间意识的、具有ODBC支持的格式。

Failing that, your last option is the overkill option (Writing an ODBC driver).

否则,最后一个选项是overkill选项(编写ODBC驱动程序)。

#2


1  

Excel can query external data souces, but beware that Excel (all versions) have hard-limits on the number of rows they can display, per work-book. I think in Excel 2003 the limit is ~65k. It's higher in other versions.

Excel可以查询外部数据资源,但要注意,Excel(所有版本)对每个工作簿显示的行数有严格限制。我认为Excel 2003的极限是~65k。在其他版本中更高。

See my question: reporting tool/viewer for large datasets (and I had much less than > 2GB).

查看我的问题:为大型数据集报告工具/查看器(我的数据比> 2GB要小得多)。

#3


0  

I used PHP FlatFile DB to query flat-files in the past

过去我使用PHP FlatFile DB查询平面文件

#4


0  

I'd get out gcc and write yourself a full ODBC driver for it. Then you can sit back and use SQL.

我将使用gcc并为它编写一个完整的ODBC驱动程序。然后您可以坐下来使用SQL。

You know, if you're bored. ;)

如果你觉得无聊的话。,)

#5


0  

use odbc driver with multithreading

使用odbc驱动程序进行多线程处理

#1


3  

While writing an ODBC driver for this is probably overkill, if the format of the files you are working with is known in advance and isn't too hard to translate (it sounds like not considering you are already creating CSVs) then using an ODBC DSN sounds like your best bet.

虽然为此编写一个ODBC驱动程序可能有些过头了,但是如果您正在使用的文件的格式是预先知道的,并且翻译起来也不是很难(听起来好像不考虑您已经在创建csv),那么使用ODBC DSN听起来是最好的选择。

There are a nice selection of ODBC drivers already built in to Windows (.txt, .csv, .mdb, .xl*, .dbf, Paradox .db, etc etc) and you can obtain other drivers from the web for a lot of common formats.

有一个很好的选择ODBC驱动程序已经内置到Windows(。txt、.csv、.mdb、.xl*、.dbf、Paradox .db等),您可以从web上获得许多常见格式的其他驱动程序。

If the size of the existing format you're exporting to is too onerous (CSV) then the logical point to start is a transformation of your data to something more space-conscious that has ODBC support.

如果您要导出的现有格式的大小太麻烦了(CSV),那么逻辑点应该是将数据转换为更有空间意识的、具有ODBC支持的格式。

Failing that, your last option is the overkill option (Writing an ODBC driver).

否则,最后一个选项是overkill选项(编写ODBC驱动程序)。

#2


1  

Excel can query external data souces, but beware that Excel (all versions) have hard-limits on the number of rows they can display, per work-book. I think in Excel 2003 the limit is ~65k. It's higher in other versions.

Excel可以查询外部数据资源,但要注意,Excel(所有版本)对每个工作簿显示的行数有严格限制。我认为Excel 2003的极限是~65k。在其他版本中更高。

See my question: reporting tool/viewer for large datasets (and I had much less than > 2GB).

查看我的问题:为大型数据集报告工具/查看器(我的数据比> 2GB要小得多)。

#3


0  

I used PHP FlatFile DB to query flat-files in the past

过去我使用PHP FlatFile DB查询平面文件

#4


0  

I'd get out gcc and write yourself a full ODBC driver for it. Then you can sit back and use SQL.

我将使用gcc并为它编写一个完整的ODBC驱动程序。然后您可以坐下来使用SQL。

You know, if you're bored. ;)

如果你觉得无聊的话。,)

#5


0  

use odbc driver with multithreading

使用odbc驱动程序进行多线程处理