如何使用Java加载旧的Microsoft Office XML文件

时间:2021-04-26 21:48:57

I'm not able to load an Excel file in the older Office XML format (think Office 2002 or 2003 version) into Java. I tried JXL and Apache's POI (version 3.7). POI doesn't work since it appears to want the newer Office .xlsx format.

我无法将旧的Office XML格式(请考虑Office 2002或2003版本)的Excel文件加载到Java中。我尝试了JXL和Apache的POI(版本3.7)。POI不能工作,因为它似乎想要更新的Office .xlsx格式。

Here's an example of the older Office XML format.

这里有一个旧的Office XML格式的示例。

One can generate a similar XML file from MS Excel 2010 by saving the workbook as the format "XML Spreadsheet 2003"?

您可以通过将工作簿保存为格式“XML Spreadsheet 2003”,从MS Excel 2010中生成类似的XML文件。

Are there any open-source Java libraries that will load the XMLSS format? Otherwise I have no choice but to write a custom parser: read the XML file then interpret the cell tags to build out the cell matrix. In this XML format, any rows with empty cell values are skipped, the next cell with data positioned with an index attribute that acts like an offset in the columns, I assume to save space in the XML file.

是否有任何开源的Java库将加载XMLSS格式?否则,我别无选择,只能编写一个自定义解析器:读取XML文件,然后解释单元格标记,构建单元格矩阵。在这种XML格式中,任何空单元格值的行都被跳过,下一个单元格的数据定位为一个索引属性,它的作用类似于列中的偏移量,我假设要在XML文件中保存空间。

6 个解决方案

#1


3  

The format is called SpreadsheetML (do not confuse with .xlsx which is also xml-based), a library called Xelem can handle it:

这种格式称为SpreadsheetML(不要和基于xml的.xlsx混淆),一个名为Xelem的库可以处理它:

import nl.fountain.xelem.excel.Workbook;
import nl.fountain.xelem.lex.ExcelReader;
//...
ExcelReader reader = new ExcelReader();
Workbook xlWorkbook = reader.getWorkbook("c:\\my\\spreadsheet.xml");
System.out.println(xlWorkbook.getSheetNames());

#2


2  

Copying Mark Beardsley's answer from POI team http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html :

从POI团队http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html复制Mark Beardsley的答案:

You have got an Office 2003 xml file there, not an OpenXML file; it is an early attempt by Microsoft to create an xml based file format for Excel and it is in that sense a 'valid' Office file format.

这里有一个Office 2003 xml文件,不是OpenXML文件;这是微软为Excel创建基于xml的文件格式的早期尝试,从这个意义上说,这是一种“有效的”Office文件格式。

Sadly, POI cannot interpret this file at all and that is why you saw the exception when you tried to wrap it up in the InputStream and pass it to WorkbookFactory(s) constructor. You do however have a number of options;

遗憾的是,POI根本无法解释这个文件,这就是为什么当您试图在InputStream中包装它并将它传递给WorkbookFactory(s)构造函数时,您会看到异常。但是你有很多选择;

  • You could use Excel itself and manually open and save each file you wish to convert, as you already have done.
  • 您可以使用Excel本身,手动打开并保存希望转换的每个文件,就像您已经做过的那样。
  • If you have access to Visual Studio and can write Visual Basic or C# code then you could use a control that will allow you to control Excel programmatically. This way you could automate a file conversion process using Excel itself. Then once the file has been converted wither to the binary or OpenXML formats, POI can be used to process it.
  • 如果您可以访问Visual Studio,并且可以编写Visual Basic或c#代码,那么您可以使用一个控件,它将允许您以编程方式控制Excel。通过这种方式,您可以使用Excel本身自动化文件转换过程。然后,一旦文件被转换成二进制或OpenXML格式,就可以使用POI来处理它。
  • If you are running on a stand alone PC on which a copy of Excel is installed and using the Windows operating system, then you could use OLE to do something very similar from Java code. As above, POI can be used to process the file following the conversion.
  • 如果您在安装了Excel副本的独立PC上运行,并使用Windows操作系统,那么您可以使用OLE来完成与Java代码非常相似的工作。如上所述,POI可用于在转换之后处理文件。
  • If you have access to OpenOffice, it has a rather good API that is accessible from Java code. You could use it to convert between the file types for you - it is simply a matter of discovering the correct filter to use in this case. OpenOffice is good for all except the most complex files and you should be able to use POI to process the file following conversion. However, if you choose this route, it may be best to do all of the work using OpenOffice's UNO api.
  • 如果您可以访问OpenOffice,它有一个相当好的API,可以从Java代码访问。您可以使用它在文件类型之间进行转换——这只是在这种情况下发现正确的过滤器的问题。OpenOffice适用于除最复杂的文件之外的所有文件,您应该能够在转换后使用POI来处理文件。但是,如果您选择此路径,那么最好使用OpenOffice的UNO api完成所有工作。
  • Depending upon what you want to do with the file's contents, you could create your own parser using core java code and either the SAX or Xerces parsers (consider using xmlBeans (http://xmlbeans.apache.org/) ). If you simply open the original xml file using a simple text editor, you can see that the structure is not complex and, if all you wish to get at is the raw data it contains, this could be your best option.
  • 根据您想对文件内容做什么,您可以使用核心java代码和SAX或Xerces解析器(考虑使用xmlBeans (http://xmlbeans.apache.org/)创建自己的解析器。如果您只是使用一个简单的文本编辑器打开原始的xml文件,您可以看到该结构并不复杂,如果您只想获得它所包含的原始数据,那么这可能是您最好的选择。

#3


1  

After a lot of pain I've found a solution to this. JODConverter uses the OpenOffice.org/LibreOffice API and can convert SpreadsheetML to whatever formats OpenOffice.org suppports.

在经历了很多痛苦之后,我找到了解决办法。JODConverter使用OpenOffice.org/LibreOffice API,可以将SpreadsheetML转换为OpenOffice.org的任何格式。

#4


0  

You might get some result using the OpenOffice API. If not directly you could probably convert to a 'supported' format. Otherwise the schema for the Office 2003 'SpreadsheetML' isn't very complicated. I have succesfully created an xslt scenario to convert a resultset (database query) to a (simple yet effective) Excel 2003 document (XML format). The other way around should not be very hard to achieve.

您可能会使用OpenOffice API得到一些结果。如果不能直接转换成“受支持的”格式。否则,Office 2003“SpreadsheetML”的架构就不是很复杂了。我成功地创建了一个xslt场景,将resultset(数据库查询)转换为Excel 2003文档(XML格式)。另一种方法不应该很难实现。

Cheers, Wim

欢呼,Wim

#5


0  

The answer today was to ask the vendor to change their Excel file format to an Excel binary rather than the old Office XML. Doing so allowed me to use Apache POI 3.7 to read the file with no issues. I appreciate the answers, as I had no idea there was no direct support in the Java-based open source libraries for this old Office XML format. Now I know next time to check earlier to see what format the Excel files are in before committing to a timeline.

今天的答案是要求供应商将他们的Excel文件格式改为Excel二进制文件,而不是旧的Office XML。这样做让我可以使用Apache POI 3.7来读取文件,没有问题。我很欣赏这些答案,因为我不知道在基于java的开放源码库中没有直接支持这种旧的Office XML格式。现在我知道下次在提交时间轴之前要检查Excel文件的格式。

#6


0  

I had the same problem some time ago, ended up writing a SAX parser to read the XML file. I wrote a blog post about it here.

不久前我遇到了同样的问题,最后我编写了一个SAX解析器来读取XML文件。我在这里写了一篇博客。

You can find the sample project to parse the file in Github.

您可以找到示例项目来解析Github中的文件。

#1


3  

The format is called SpreadsheetML (do not confuse with .xlsx which is also xml-based), a library called Xelem can handle it:

这种格式称为SpreadsheetML(不要和基于xml的.xlsx混淆),一个名为Xelem的库可以处理它:

import nl.fountain.xelem.excel.Workbook;
import nl.fountain.xelem.lex.ExcelReader;
//...
ExcelReader reader = new ExcelReader();
Workbook xlWorkbook = reader.getWorkbook("c:\\my\\spreadsheet.xml");
System.out.println(xlWorkbook.getSheetNames());

#2


2  

Copying Mark Beardsley's answer from POI team http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html :

从POI团队http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html复制Mark Beardsley的答案:

You have got an Office 2003 xml file there, not an OpenXML file; it is an early attempt by Microsoft to create an xml based file format for Excel and it is in that sense a 'valid' Office file format.

这里有一个Office 2003 xml文件,不是OpenXML文件;这是微软为Excel创建基于xml的文件格式的早期尝试,从这个意义上说,这是一种“有效的”Office文件格式。

Sadly, POI cannot interpret this file at all and that is why you saw the exception when you tried to wrap it up in the InputStream and pass it to WorkbookFactory(s) constructor. You do however have a number of options;

遗憾的是,POI根本无法解释这个文件,这就是为什么当您试图在InputStream中包装它并将它传递给WorkbookFactory(s)构造函数时,您会看到异常。但是你有很多选择;

  • You could use Excel itself and manually open and save each file you wish to convert, as you already have done.
  • 您可以使用Excel本身,手动打开并保存希望转换的每个文件,就像您已经做过的那样。
  • If you have access to Visual Studio and can write Visual Basic or C# code then you could use a control that will allow you to control Excel programmatically. This way you could automate a file conversion process using Excel itself. Then once the file has been converted wither to the binary or OpenXML formats, POI can be used to process it.
  • 如果您可以访问Visual Studio,并且可以编写Visual Basic或c#代码,那么您可以使用一个控件,它将允许您以编程方式控制Excel。通过这种方式,您可以使用Excel本身自动化文件转换过程。然后,一旦文件被转换成二进制或OpenXML格式,就可以使用POI来处理它。
  • If you are running on a stand alone PC on which a copy of Excel is installed and using the Windows operating system, then you could use OLE to do something very similar from Java code. As above, POI can be used to process the file following the conversion.
  • 如果您在安装了Excel副本的独立PC上运行,并使用Windows操作系统,那么您可以使用OLE来完成与Java代码非常相似的工作。如上所述,POI可用于在转换之后处理文件。
  • If you have access to OpenOffice, it has a rather good API that is accessible from Java code. You could use it to convert between the file types for you - it is simply a matter of discovering the correct filter to use in this case. OpenOffice is good for all except the most complex files and you should be able to use POI to process the file following conversion. However, if you choose this route, it may be best to do all of the work using OpenOffice's UNO api.
  • 如果您可以访问OpenOffice,它有一个相当好的API,可以从Java代码访问。您可以使用它在文件类型之间进行转换——这只是在这种情况下发现正确的过滤器的问题。OpenOffice适用于除最复杂的文件之外的所有文件,您应该能够在转换后使用POI来处理文件。但是,如果您选择此路径,那么最好使用OpenOffice的UNO api完成所有工作。
  • Depending upon what you want to do with the file's contents, you could create your own parser using core java code and either the SAX or Xerces parsers (consider using xmlBeans (http://xmlbeans.apache.org/) ). If you simply open the original xml file using a simple text editor, you can see that the structure is not complex and, if all you wish to get at is the raw data it contains, this could be your best option.
  • 根据您想对文件内容做什么,您可以使用核心java代码和SAX或Xerces解析器(考虑使用xmlBeans (http://xmlbeans.apache.org/)创建自己的解析器。如果您只是使用一个简单的文本编辑器打开原始的xml文件,您可以看到该结构并不复杂,如果您只想获得它所包含的原始数据,那么这可能是您最好的选择。

#3


1  

After a lot of pain I've found a solution to this. JODConverter uses the OpenOffice.org/LibreOffice API and can convert SpreadsheetML to whatever formats OpenOffice.org suppports.

在经历了很多痛苦之后,我找到了解决办法。JODConverter使用OpenOffice.org/LibreOffice API,可以将SpreadsheetML转换为OpenOffice.org的任何格式。

#4


0  

You might get some result using the OpenOffice API. If not directly you could probably convert to a 'supported' format. Otherwise the schema for the Office 2003 'SpreadsheetML' isn't very complicated. I have succesfully created an xslt scenario to convert a resultset (database query) to a (simple yet effective) Excel 2003 document (XML format). The other way around should not be very hard to achieve.

您可能会使用OpenOffice API得到一些结果。如果不能直接转换成“受支持的”格式。否则,Office 2003“SpreadsheetML”的架构就不是很复杂了。我成功地创建了一个xslt场景,将resultset(数据库查询)转换为Excel 2003文档(XML格式)。另一种方法不应该很难实现。

Cheers, Wim

欢呼,Wim

#5


0  

The answer today was to ask the vendor to change their Excel file format to an Excel binary rather than the old Office XML. Doing so allowed me to use Apache POI 3.7 to read the file with no issues. I appreciate the answers, as I had no idea there was no direct support in the Java-based open source libraries for this old Office XML format. Now I know next time to check earlier to see what format the Excel files are in before committing to a timeline.

今天的答案是要求供应商将他们的Excel文件格式改为Excel二进制文件,而不是旧的Office XML。这样做让我可以使用Apache POI 3.7来读取文件,没有问题。我很欣赏这些答案,因为我不知道在基于java的开放源码库中没有直接支持这种旧的Office XML格式。现在我知道下次在提交时间轴之前要检查Excel文件的格式。

#6


0  

I had the same problem some time ago, ended up writing a SAX parser to read the XML file. I wrote a blog post about it here.

不久前我遇到了同样的问题,最后我编写了一个SAX解析器来读取XML文件。我在这里写了一篇博客。

You can find the sample project to parse the file in Github.

您可以找到示例项目来解析Github中的文件。