I want to be able to upload an MS word document and export it a page in my site.
我希望能够上传MS Word文档并将其导出到我的网站中。
Is there any way to accomplish this?
有没有办法实现这个目标?
5 个解决方案
#1
20
//FUNCTION :: read a docx file and return the string
function readDocx($filePath) {
// Create new ZIP archive
$zip = new ZipArchive;
$dataFile = 'word/document.xml';
// Open received archive file
if (true === $zip->open($filePath)) {
// If done, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// If found, read it to the string
$data = $zip->getFromIndex($index);
// Close archive file
$zip->close();
// Load XML from a string
// Skip errors and warnings
$xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Return data without XML formatting tags
$contents = explode('\n',strip_tags($xml->saveXML()));
$text = '';
foreach($contents as $i=>$content) {
$text .= $contents[$i];
}
return $text;
}
$zip->close();
}
// In case of failure return empty string
return "";
}
ZipArchive and DOMDocument are both inside PHP so you don't need to install/include/require additional libraries.
ZipArchive和DOMDocument都在PHP内部,因此您不需要安装/ include / require其他库。
#2
3
One may use PHPDocX.
可以使用PHPDocX。
It has support for practically all HTML CSS styles. Moreover you may use templates to add extra formatting to your HTML via the replaceTemplateVariableByHTML
.
它几乎支持所有HTML CSS样式。此外,您可以使用模板通过replaceTemplateVariableByHTML为HTML添加额外的格式。
The HTML methods of PHPDocX also allow for the direct use of Word styles. You may use something like this:
PHPDocX的HTML方法也允许直接使用Word样式。你可以使用这样的东西:
$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));
$ docx-> embedHTML($ myHTML,array('tableStyle'=>'MediumGrid3-accent5PHPDOCX'));
If you want that all your tables use the MediumGrid3-accent5 Word style. The embedHTML method as well as its version for templates (replaceTemplateVariableByHTML
) preserve inheritance, meaning by that that you may use a predefined Word style and override with CSS any of its properties.
如果您希望所有表都使用MediumGrid3-accent5 Word样式。 embedHTML方法及其模板版本(replaceTemplateVariableByHTML)保留了继承,这意味着您可以使用预定义的Word样式并使用CSS覆盖其任何属性。
You may also extract selected parts of your HTML using 'JQuery type' selectors.
您还可以使用“JQuery类型”选择器提取HTML的选定部分。
#4
1
You can convert Word docx documents to html using Print2flash library. Here is an PHP excerpt from my client's site which converts a document to html:
您可以使用Print2flash库将Word docx文档转换为html。这是我客户网站的PHP摘录,它将文档转换为html:
include("const.php");
$p2fServ = new COM("Print2Flash4.Server2");
$p2fServ->DefaultProfile->DocumentType=HTML5;
$p2fServ->ConvertFile($wordfile,$htmlFile);
It converts a document which path is specified in $wordfile variable to a html page file specified by $htmlFile variable. All formatting, hyperlinks and charts are retained. You can get the required const.php file altogether with a fuller sample from Print2flash SDK.
它将$ wordfile变量中指定路径的文档转换为$ htmlFile变量指定的html页面文件。保留所有格式,超链接和图表。您可以使用Print2flash SDK中的更全面的示例获得所需的const.php文件。
#5
0
If you don't refuse REST API, then you can use:
如果您不拒绝REST API,那么您可以使用:
- Apache Tika. Is a proven OSS leader for text-extraction
- 阿帕奇塔卡。是经过验证的OSS文本提取领导者
- If you don't want to hassle with configuring and want ready-to-go solution you can use RawText, but it's not free.
- 如果您不想麻烦配置并想要准备好的解决方案,您可以使用RawText,但它不是免费的。
Sample code for RawText:
RawText的示例代码:
$result = $rawText -> parse($your_file)
#1
20
//FUNCTION :: read a docx file and return the string
function readDocx($filePath) {
// Create new ZIP archive
$zip = new ZipArchive;
$dataFile = 'word/document.xml';
// Open received archive file
if (true === $zip->open($filePath)) {
// If done, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// If found, read it to the string
$data = $zip->getFromIndex($index);
// Close archive file
$zip->close();
// Load XML from a string
// Skip errors and warnings
$xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Return data without XML formatting tags
$contents = explode('\n',strip_tags($xml->saveXML()));
$text = '';
foreach($contents as $i=>$content) {
$text .= $contents[$i];
}
return $text;
}
$zip->close();
}
// In case of failure return empty string
return "";
}
ZipArchive and DOMDocument are both inside PHP so you don't need to install/include/require additional libraries.
ZipArchive和DOMDocument都在PHP内部,因此您不需要安装/ include / require其他库。
#2
3
One may use PHPDocX.
可以使用PHPDocX。
It has support for practically all HTML CSS styles. Moreover you may use templates to add extra formatting to your HTML via the replaceTemplateVariableByHTML
.
它几乎支持所有HTML CSS样式。此外,您可以使用模板通过replaceTemplateVariableByHTML为HTML添加额外的格式。
The HTML methods of PHPDocX also allow for the direct use of Word styles. You may use something like this:
PHPDocX的HTML方法也允许直接使用Word样式。你可以使用这样的东西:
$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));
$ docx-> embedHTML($ myHTML,array('tableStyle'=>'MediumGrid3-accent5PHPDOCX'));
If you want that all your tables use the MediumGrid3-accent5 Word style. The embedHTML method as well as its version for templates (replaceTemplateVariableByHTML
) preserve inheritance, meaning by that that you may use a predefined Word style and override with CSS any of its properties.
如果您希望所有表都使用MediumGrid3-accent5 Word样式。 embedHTML方法及其模板版本(replaceTemplateVariableByHTML)保留了继承,这意味着您可以使用预定义的Word样式并使用CSS覆盖其任何属性。
You may also extract selected parts of your HTML using 'JQuery type' selectors.
您还可以使用“JQuery类型”选择器提取HTML的选定部分。
#3
#4
1
You can convert Word docx documents to html using Print2flash library. Here is an PHP excerpt from my client's site which converts a document to html:
您可以使用Print2flash库将Word docx文档转换为html。这是我客户网站的PHP摘录,它将文档转换为html:
include("const.php");
$p2fServ = new COM("Print2Flash4.Server2");
$p2fServ->DefaultProfile->DocumentType=HTML5;
$p2fServ->ConvertFile($wordfile,$htmlFile);
It converts a document which path is specified in $wordfile variable to a html page file specified by $htmlFile variable. All formatting, hyperlinks and charts are retained. You can get the required const.php file altogether with a fuller sample from Print2flash SDK.
它将$ wordfile变量中指定路径的文档转换为$ htmlFile变量指定的html页面文件。保留所有格式,超链接和图表。您可以使用Print2flash SDK中的更全面的示例获得所需的const.php文件。
#5
0
If you don't refuse REST API, then you can use:
如果您不拒绝REST API,那么您可以使用:
- Apache Tika. Is a proven OSS leader for text-extraction
- 阿帕奇塔卡。是经过验证的OSS文本提取领导者
- If you don't want to hassle with configuring and want ready-to-go solution you can use RawText, but it's not free.
- 如果您不想麻烦配置并想要准备好的解决方案,您可以使用RawText,但它不是免费的。
Sample code for RawText:
RawText的示例代码:
$result = $rawText -> parse($your_file)