I'd like to extract the content of a MS Word 2003 document into HTML in C#.
我想在C#中将MS Word 2003文档的内容提取为HTML。
Any ideas?
2 个解决方案
#1
I think this is the easiest way to do it
我认为这是最简单的方法
http://asptutorials.net/C-SHARP/convert-ms-word-docs-to-html/
They key point in the article is that they use the SaveAs function http://msdn.microsoft.com/en-us/library/aa220734.aspx
他们在文章中的关键点是他们使用SaveAs函数http://msdn.microsoft.com/en-us/library/aa220734.aspx
Like this:
string newfilename = folder_to_save_in + FileUpload1.FileName.Replace(".doc", ".html");
object o_nullobject = System.Reflection.Missing.Value;
object o_newfilename = newfilename;
object o_format = Word.WdSaveFormat.wdFormatHTML;
object o_encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
object o_endings = Word.WdLineEndingType.wdCRLF;
// SaveAs requires lots of parameters, but we can leave most of them empty:
wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,
ref o_nullobject, ref o_endings, ref o_nullobject);
The library is Microsoft.Office.Interop.Word;
该库是Microsoft.Office.Interop.Word;
If I remember correctly Word is required on the machine where the code is executed. If it's ASP.NET it is required on the server.
如果我没记错的话,在执行代码的机器上需要Word。如果它是ASP.NET,则在服务器上需要它。
#2
Three ways: 1. save as HTML, as described by napster 2. transform the Open XML to HTML; the XSLT is available at http://www.codeplex.com/OpenXMLViewer 3. for the cleanest HTML, write code to convert each style in the document to CSS, and put any direct formatting in @style.
三种方式:1。保存为HTML,如napster所述2.将Open XML转换为HTML;可以在http://www.codeplex.com/OpenXMLViewer 3上找到XSLT,获取最干净的HTML,编写代码将文档中的每个样式转换为CSS,并在@style中放置任何直接格式。
Is Word installed on the computer running your C# code?
Word是否在运行C#代码的计算机上安装?
#1
I think this is the easiest way to do it
我认为这是最简单的方法
http://asptutorials.net/C-SHARP/convert-ms-word-docs-to-html/
They key point in the article is that they use the SaveAs function http://msdn.microsoft.com/en-us/library/aa220734.aspx
他们在文章中的关键点是他们使用SaveAs函数http://msdn.microsoft.com/en-us/library/aa220734.aspx
Like this:
string newfilename = folder_to_save_in + FileUpload1.FileName.Replace(".doc", ".html");
object o_nullobject = System.Reflection.Missing.Value;
object o_newfilename = newfilename;
object o_format = Word.WdSaveFormat.wdFormatHTML;
object o_encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
object o_endings = Word.WdLineEndingType.wdCRLF;
// SaveAs requires lots of parameters, but we can leave most of them empty:
wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,
ref o_nullobject, ref o_endings, ref o_nullobject);
The library is Microsoft.Office.Interop.Word;
该库是Microsoft.Office.Interop.Word;
If I remember correctly Word is required on the machine where the code is executed. If it's ASP.NET it is required on the server.
如果我没记错的话,在执行代码的机器上需要Word。如果它是ASP.NET,则在服务器上需要它。
#2
Three ways: 1. save as HTML, as described by napster 2. transform the Open XML to HTML; the XSLT is available at http://www.codeplex.com/OpenXMLViewer 3. for the cleanest HTML, write code to convert each style in the document to CSS, and put any direct formatting in @style.
三种方式:1。保存为HTML,如napster所述2.将Open XML转换为HTML;可以在http://www.codeplex.com/OpenXMLViewer 3上找到XSLT,获取最干净的HTML,编写代码将文档中的每个样式转换为CSS,并在@style中放置任何直接格式。
Is Word installed on the computer running your C# code?
Word是否在运行C#代码的计算机上安装?