将Microsoft Word xml文件导出到docx

时间:2022-10-30 12:06:03

I am trying to create a Microsoft word document without using any 3rd party libraries. What I am trying to do is :

我试图在不使用任何第三方库的情况下创建Microsoft Word文档。我想做的是:

  • Create a template document in Microsoft Word
  • 在Microsoft Word中创建模板文档

  • Save it as an XML File
  • 将其另存为XML文件

  • Read this XML file and populate the data in PHP
  • 阅读此XML文件并使用PHP填充数据

I am able to do it so far. I would like to export it as an *.docx format. However when I do that, it is throwing an exception, when I try to open it.

到目前为止我能够做到这一点。我想将其导出为* .docx格式。但是,当我这样做时,它会抛出一个例外,当我尝试打开它时。

Error Message : File is corrupt and cannot be opened


However, when I save it as *.doc, I am able to open the word document.

但是,当我将其保存为* .doc时,我可以打开word文档。

Any idea, what could be wrong. Do I need to use any libraries to export it to an docx file ?



2 个解决方案



Docx is not backwards-compatible with doc. Docx is a zipped format: Docx Tag Info.

Docx与doc无向后兼容。 Docx是一种压缩格式:Docx Tag Info。

I would recommend you to create another template for the docx format, because the formats are so different.




Also, you might want to check that your code is writing the correct encoding. Before I put it in the correct encoding I was getting odd letters that weren't compatible when I converted it into a .docx format. To do this I implemented it in the inputstream:


InputStreamReader isr= new InputStreamReader(template.getInputStream(entry), "UTF-8");
BufferedReader fileContents = new BufferedReader(isr);

I used this with enumeration for the entry, but the "UTF-8" puts it in the right format and eliminates the odd characters. I was also getting "null" typed out at the end of some of the xml's, so I eliminated that by taking it out (I brought the contents of each file into a string so I could manipulate it anyway):


String ending = "null";
while(sb.indexOf(ending) != -1){
sb.delete(sb.indexOf(ending), (sb.indexOf(ending) + ending.length())); 

sb was the stringbuilder I put it into. This problem may have been solved with the UTF-8, but I fixed it before I implemented the encoding, so figured I'd include it in case it ends up being a problem. I hope this helps.




Docx is not backwards-compatible with doc. Docx is a zipped format: Docx Tag Info.

Docx与doc无向后兼容。 Docx是一种压缩格式:Docx Tag Info。

I would recommend you to create another template for the docx format, because the formats are so different.




Also, you might want to check that your code is writing the correct encoding. Before I put it in the correct encoding I was getting odd letters that weren't compatible when I converted it into a .docx format. To do this I implemented it in the inputstream:


InputStreamReader isr= new InputStreamReader(template.getInputStream(entry), "UTF-8");
BufferedReader fileContents = new BufferedReader(isr);

I used this with enumeration for the entry, but the "UTF-8" puts it in the right format and eliminates the odd characters. I was also getting "null" typed out at the end of some of the xml's, so I eliminated that by taking it out (I brought the contents of each file into a string so I could manipulate it anyway):


String ending = "null";
while(sb.indexOf(ending) != -1){
sb.delete(sb.indexOf(ending), (sb.indexOf(ending) + ending.length())); 

sb was the stringbuilder I put it into. This problem may have been solved with the UTF-8, but I fixed it before I implemented the encoding, so figured I'd include it in case it ends up being a problem. I hope this helps.
