在PHP中从PDF中提取文本

时间:2023-01-15 12:37:41

I'm creating a php based web application which allows the user to upload a PDF file. This file will then be read and checked for certain data (text).

我正在创建一个基于php的Web应用程序,允许用户上传PDF文件。然后将读取此文件并检查某些数据(文本)。

The problem is I can't figure out how to even open a PDF file in PHP. There are some PDF libraries mainly for creating PDF's, but they don't seem to be very good at reading them.

问题是我无法弄清楚如何在PHP中打开PDF文件。有一些PDF库主要用于创建PDF,但它们似乎并不擅长阅读它们。

An alternative solution would be to use an already available solution in Python or something else (as described in other threads on this site) but I'd really like to stay as much as possible in PHP as I intend to later export the data to mysql, etc.

另一种解决方案是使用Python中已有的解决方案或其他东西(如本网站上的其他线程所述),但我真的希望尽可能多地保留PHP,因为我打算稍后将数据导出到mysql等

Any input on how to read a PDF and extract data from it would be much appreciated.

任何关于如何阅读PDF并从中提取数据的输入都将非常感激。

1 个解决方案

#1


0  

I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation It's just a matter of downloading and telling your code to include it, just like the documentation shows.

我个人没有试过这个,但看起来这个有用:http://www.pdfparser.org/documentation这只是下载和告诉你的代码包含它的问题,就像文档显示的那样。

Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html

或者您可以尝试http://www.phpclasses.org/browse/file/31030.html中的class.pdf2text.php。

#1


0  

I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation It's just a matter of downloading and telling your code to include it, just like the documentation shows.

我个人没有试过这个,但看起来这个有用:http://www.pdfparser.org/documentation这只是下载和告诉你的代码包含它的问题,就像文档显示的那样。

Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html

或者您可以尝试http://www.phpclasses.org/browse/file/31030.html中的class.pdf2text.php。