使用PHP从PDF文件中提取HTML表格？

I was wondering if it was possible to extract a table of data from a PDF file, into an array or similar so i can import the table data using PHP? I have DomPDF installed to create PDF files, but this does not have options for reading PDF. If i read the PDF file in PHP i get an encoded string:

我想知道是否有可能从PDF文件,数组或类似的数据中提取数据表,以便我可以使用PHP导入表数据?我安装了DomPDF来创建PDF文件,但是这里没有阅读PDF的选项。如果我在PHP中读取PDF文件,我会得到一个编码字符串:

%PDF-1.5 5 0 obj <>>> endobj 6 0 obj <>stream x��ێ+��W�\`��E��u

%PDF-1.5 5 0 obj <>>> endobj 6 0 obj <> streamx + W \` E u

Any help would be appreciated.

任何帮助,将不胜感激。

Adam

1 个解决方案

#1

This post is pretty old but seems to have a decent amount of views.

这篇文章相当陈旧,但似乎有相当多的观点。

I'm working on a similar project and have had some success with this https://github.com/mgufrone/pdf-to-html . The HTML returns is just a bunch of absolutely positioned p tags, but if the format of your pdfs are consistent you might have some luck working something out to either parse the table or at least get the data you need.

我正在开展一个类似的项目,并且在这个https://github.com/mgufrone/pdf-to-html上取得了一些成功。 HTML返回只是一堆绝对定位的p标签,但是如果你的pdf的格式是一致的,你可能会有一些运气来解析表或者至少得到你需要的数据。

Just make sure that you have the poppler utilities installed.

只需确保安装了poppler实用程序。

#1