Plone全文索引Excel文件。

时间:2022-10-22 07:23:16

how can I customize Plone search engine in order to actvate full text indexing of excel files? I have already installed pdftotext and wv for pdf, word files full text indexing.

如何自定义Plone搜索引擎来实现excel文件的全文索引?我已经为pdf安装了pdftotext和wv, word文件全文索引。

2 个解决方案

#1


5  

If you add Products.OpenXml to your instance eggs and install it in Plone you can index modern Office formats, at least .docx and .xlsx. For plain old Excel (.xls) files this does not work.

如果你添加产品。将OpenXml安装到实例鸡蛋中,并将其安装到Plone中,您可以索引现代的办公格式,至少是.docx和.xlsx。对于普通的Excel (.xls)文件,这是无效的。

I tried it in a Plone 4.3.2 buildout config a few weeks ago:

几周前我在Plone 4.3.2构建配置中尝试过:

[instance]
eggs =
    ...
    Products.OpenXml

[versions]
# You need a more recent lxml than default Plone, some 3.x version
lxml = 3.3.3
Products.OpenXml = 1.1.1

Alternatively or additionally, use Products.AROfficeTransforms. I have only tried it in combination with Products.OpenXml, but Products.AROfficeTransforms on its own is sufficient if you are only interested in old-style excel sheets, .xls. In a buildout config:

另外,也可以使用product . arofficetransform。我只尝试过与产品结合使用。OpenXml,但产品。如果您只对旧式的excel表.xls感兴趣,那么AROfficeTransforms本身就足够了。很多配置:

[instance]
eggs =
    ...
    Products.AROfficeTransforms

[versions]
Products.AROfficeTransforms = 0.11.0

It requires the xlhtml binary to be installed on your system. This is an ancient binary, last changed in 2002. I did not try to install it myself.

它需要在系统上安装xlhtml二进制文件。这是一个古老的二元结构,上次改变是在2002年。我没有尝试自己安装它。

#2


1  

Try ftw.tika

尝试ftw.tika

Supported formats:

支持格式:

  • Microsoft Office formats (Office Open XML)
  • Microsoft Office格式(Office Open XML)
  • *.docx Word Documents
  • *。多克斯Word文档
  • *.dotx Word Templates
  • *。dotx Word模板
  • *.xlsx Excel Sheets
  • *。xlsx Excel表
  • *.xltx Excel Templates
  • *。xltx Excel模板
  • *.pptx Powerpoint Presentations
  • *。pptx幻灯片演示
  • *.potx Powerpoint Templates
  • *。potx幻灯片模板
  • *.ppsx Powerpoint Slideshows
  • *。ppsx Powerpoint幻灯片
  • Legacy Microsoft Office (97) formats
  • 遗留的Microsoft Office(97)格式
  • Rich Text Format
  • 富文本格式
  • OpenOffice ODF formats
  • OpenOffice ODF格式
  • OpenOffice 1.x formats
  • OpenOffice 1。x格式
  • Common Adobe formats (InDesign, Illustrator, Photoshop)
  • 常见的Adobe格式(InDesign, Illustrator, Photoshop)
  • PDF documents
  • PDF文档
  • WordPerfect documents E-Mail messages
  • 完美文书文件的电子邮件

It's based on apache tika and runs as a service managed by supervisor (You have to extend your buildout).

它基于apache tika,并作为由主管管理的服务运行(您必须扩展您的构建)。

It's integrated with portal_transforms, is well tested and documented.

它与portal_transforms集成,经过良好的测试和记录。

More infos:

更多信息:

  • Release on pypi
  • pypi上释放

#1


5  

If you add Products.OpenXml to your instance eggs and install it in Plone you can index modern Office formats, at least .docx and .xlsx. For plain old Excel (.xls) files this does not work.

如果你添加产品。将OpenXml安装到实例鸡蛋中,并将其安装到Plone中,您可以索引现代的办公格式,至少是.docx和.xlsx。对于普通的Excel (.xls)文件,这是无效的。

I tried it in a Plone 4.3.2 buildout config a few weeks ago:

几周前我在Plone 4.3.2构建配置中尝试过:

[instance]
eggs =
    ...
    Products.OpenXml

[versions]
# You need a more recent lxml than default Plone, some 3.x version
lxml = 3.3.3
Products.OpenXml = 1.1.1

Alternatively or additionally, use Products.AROfficeTransforms. I have only tried it in combination with Products.OpenXml, but Products.AROfficeTransforms on its own is sufficient if you are only interested in old-style excel sheets, .xls. In a buildout config:

另外,也可以使用product . arofficetransform。我只尝试过与产品结合使用。OpenXml,但产品。如果您只对旧式的excel表.xls感兴趣,那么AROfficeTransforms本身就足够了。很多配置:

[instance]
eggs =
    ...
    Products.AROfficeTransforms

[versions]
Products.AROfficeTransforms = 0.11.0

It requires the xlhtml binary to be installed on your system. This is an ancient binary, last changed in 2002. I did not try to install it myself.

它需要在系统上安装xlhtml二进制文件。这是一个古老的二元结构,上次改变是在2002年。我没有尝试自己安装它。

#2


1  

Try ftw.tika

尝试ftw.tika

Supported formats:

支持格式:

  • Microsoft Office formats (Office Open XML)
  • Microsoft Office格式(Office Open XML)
  • *.docx Word Documents
  • *。多克斯Word文档
  • *.dotx Word Templates
  • *。dotx Word模板
  • *.xlsx Excel Sheets
  • *。xlsx Excel表
  • *.xltx Excel Templates
  • *。xltx Excel模板
  • *.pptx Powerpoint Presentations
  • *。pptx幻灯片演示
  • *.potx Powerpoint Templates
  • *。potx幻灯片模板
  • *.ppsx Powerpoint Slideshows
  • *。ppsx Powerpoint幻灯片
  • Legacy Microsoft Office (97) formats
  • 遗留的Microsoft Office(97)格式
  • Rich Text Format
  • 富文本格式
  • OpenOffice ODF formats
  • OpenOffice ODF格式
  • OpenOffice 1.x formats
  • OpenOffice 1。x格式
  • Common Adobe formats (InDesign, Illustrator, Photoshop)
  • 常见的Adobe格式(InDesign, Illustrator, Photoshop)
  • PDF documents
  • PDF文档
  • WordPerfect documents E-Mail messages
  • 完美文书文件的电子邮件

It's based on apache tika and runs as a service managed by supervisor (You have to extend your buildout).

它基于apache tika,并作为由主管管理的服务运行(您必须扩展您的构建)。

It's integrated with portal_transforms, is well tested and documented.

它与portal_transforms集成,经过良好的测试和记录。

More infos:

更多信息:

  • Release on pypi
  • pypi上释放