如何使用格式化的open xml将docx转换成html文件?

时间:2021-08-17 06:20:21

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.


I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.

我正在使用Open xml sdk 2.5和Power工具将.docx文件转换为.html文件,该文件使用HtmlConverter类进行转换。

I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.


Here is my existing code:


public void ConvertDocxToHtml(string fileName)
   byte[] byteArray = File.ReadAllBytes(fileName);
   using (MemoryStream memoryStream = new MemoryStream())
      memoryStream.Write(byteArray, 0, byteArray.Length);
      using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
         HtmlConverterSettings settings = new HtmlConverterSettings()
            PageTitle = "My Page Title"
         XElement html = HtmlConverter.ConvertToHtml(doc, settings);
         File.WriteAllText(@"E:\Test.html", html.ToStringNewLineOnAttributes());

So I just want to know if is there any way by which I can retain the formatting in converted HTML file.


I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.


4 个解决方案



PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9




You might want to find an external tool to help you do this, like Aspose Words

您可能希望找到一个外部工具来帮助您实现这一点,比如Aspose word



Your end result will not look exactly the way your Word Document turns out, but this link might help.




You can use OpenXML Viewer extension for Firefox for Converting with formatting. http://openxmlviewer.codeplex.com This works for me. Hope this helps.




PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9




You might want to find an external tool to help you do this, like Aspose Words

您可能希望找到一个外部工具来帮助您实现这一点,比如Aspose word



Your end result will not look exactly the way your Word Document turns out, but this link might help.




You can use OpenXML Viewer extension for Firefox for Converting with formatting. http://openxmlviewer.codeplex.com This works for me. Hope this helps.
