将Microsoft Word xml文件导出到docx

I am trying to create a Microsoft word document without using any 3rd party libraries. What I am trying to do is :

我试图在不使用任何第三方库的情况下创建Microsoft Word文档。我想做的是:

Create a template document in Microsoft Word

在Microsoft Word中创建模板文档

Save it as an XML File

将其另存为XML文件

Read this XML file and populate the data in PHP

阅读此XML文件并使用PHP填充数据

I am able to do it so far. I would like to export it as an *.docx format. However when I do that, it is throwing an exception, when I try to open it.

到目前为止我能够做到这一点。我想将其导出为* .docx格式。但是,当我这样做时,它会抛出一个例外,当我尝试打开它时。

Error Message : File is corrupt and cannot be opened

错误消息:文件已损坏,无法打开

However, when I save it as *.doc, I am able to open the word document.

但是,当我将其保存为* .doc时,我可以打开word文档。

Any idea, what could be wrong. Do I need to use any libraries to export it to an docx file ?

任何想法,可能是错的。我是否需要使用任何库将其导出到docx文件?

Thanks

2 个解决方案

#1

Docx is not backwards-compatible with doc. Docx is a zipped format: Docx Tag Info.

Docx与doc无向后兼容。 Docx是一种压缩格式:Docx Tag Info。

I would recommend you to create another template for the docx format, because the formats are so different.

我建议你为docx格式创建另一个模板,因为格式是如此不同。

#2

Also, you might want to check that your code is writing the correct encoding. Before I put it in the correct encoding I was getting odd letters that weren't compatible when I converted it into a .docx format. To do this I implemented it in the inputstream:

此外,您可能希望检查您的代码是否正在编写正确的编码。在我把它放入正确的编码之前,当我将它转换为.docx格式时,我得到了不兼容的奇怪字母。为此,我在输入流中实现了它:

InputStreamReader isr= new InputStreamReader(template.getInputStream(entry), "UTF-8");
BufferedReader fileContents = new BufferedReader(isr);

I used this with enumeration for the entry, but the "UTF-8" puts it in the right format and eliminates the odd characters. I was also getting "null" typed out at the end of some of the xml's, so I eliminated that by taking it out (I brought the contents of each file into a string so I could manipulate it anyway):

我使用枚举作为条目,但“UTF-8”将其放入正确的格式并消除奇数字符。我也在某些xml的末尾输入了“null”,所以我通过取出它来消除它(我把每个文件的内容都带到了一个字符串中,所以无论如何我都可以操作它):

String ending = "null";
while(sb.indexOf(ending) != -1){
sb.delete(sb.indexOf(ending), (sb.indexOf(ending) + ending.length())); 
}

sb was the stringbuilder I put it into. This problem may have been solved with the UTF-8, but I fixed it before I implemented the encoding, so figured I'd include it in case it ends up being a problem. I hope this helps.

sb是我把它放进去的字符串。这个问题可能已经用UTF-8解决了,但我在实现编码之前修复了它,所以我想包括它以防它最终成为一个问题。我希望这有帮助。

#1