php exec()和tesseract“无法打开输入文件”

时间:2022-09-24 08:54:15

I use Ghostscript to strip images from PDF files into jpg and run Tesseract to save txt content like this:

我使用Ghostscript从PDF文件中删除图片到jpg中,并运行Tesseract来保存txt内容如下:

  • Ghostscript located in c:\engine\gs\
  • 内容位于c:\ \ gs \引擎
  • Tesseract located in c:\engine\tesseract\
  • 超正方体位于c:\ \超正方体\引擎
  • web located pdf/jpg/txt dir = file/tmp/
  • 网址为pdf/jpg/txt dir =文件/tmp/

Code:

代码:

$pathgs = "c:\\engine\\gs\\";
$pathtess = "c:\\engine\\tesseract\\";
$pathfile = "file/tmp/"

// Strip images
putenv("PATH=".$pathgs);
$exec = "gs -dNOPAUSE -sDEVICE=jpeg -r300 -sOutputFile=".$pathfile."strip%d.jpg ".$pathfile."upload.pdf -q -c quit";
shell_exec($exec);

// OCR
putenv("PATH=".$pathtess);
$exec = "tesseract.exe '".$pathfile."strip1.jpg' '".$pathfile."ocr' -l eng";
exec($exec, $msg);
print_r($msg);
echo file_get_contents($pathfile."ocr.txt");

Stripping the image (its just 1 page) works fine, but Tesseract echoes:

剥去图像(它只有1页)效果很好,但是Tesseract的回声:

Array
  (
    [0] => Tesseract Open Source OCR Engine v3.01 with Leptonica
    [1] => Cannot open input file: 'file/tmp/strip1.jpg'
  )

and no ocr.txt file is generated, thus leading into a 'failed to open stream' error in PHP.

和光学字符识别。生成txt文件,从而导致PHP“未打开流”错误。

  • Copying strip1.jpg into c:/engine/tesseract/ folder and running Tesseract from command (tesseract strip1.jpg ocr.txt -l eng) runs without any issue.
  • 将strip1.jpg复制到c:/engine/tesseract/文件夹并运行tesseract from命令(tesseract strip1.jpg ocr)。txt -l eng)运行没有任何问题。
  • Replacing the putenv() quote by exec(c:/engine/tesseract/tesseract ... ) returns the a.m. error
  • 用exec替换putenv()报价(c:/engine/tesseract/tesseract…)上午返回错误
  • I kept strip1.jpg in the Tesseract folder and ran exec(tesseract 'c:/engine/tesseract/strip1.jpg' ... ) returns the a.m. error
  • 我在Tesseract文件夹中保存了strip1.jpg,并运行exec(Tesseract 'c:/engine/ Tesseract /strip1.jpg'…)上午返回错误
  • Leaving away the apostrophs around path/strip1.jpg returns an empty array as message and does not create the ocr.txt file.
  • 将路径/strip1.jpg周围的撇号省略掉会返回一个空数组作为消息,并且不会创建ocr。txt文件。
  • writing the command directly into the exec() quote instead of using $exec doesn't make the change.
  • 直接将命令写入exec()引号中而不是使用$exec,不会进行更改。

What am I doing wrong?

我做错了什么?

2 个解决方案

#1


0  

Perhaps the missing environment variables in PHP is the problem here. Have a look at my question here to see if setting HOME or PATH sorts this out?

也许PHP中缺少的环境变量是这里的问题。看看我的问题,看看设置HOME还是PATH ?

#2


1  

Halfer, you made my day:-)

霍弗,你使我的一天:

Not exactly the way as described in your post but like this:

不像你的文章中描述的那样:

$path = str_replace("index.php", "../".$pathfile, $_SERVER['SCRIPT_FILENAME']);

$descriptors = array(
   0 => array("pipe", "r"),
   1 => array("pipe", "w"),
   2 => array("pipe", "w")
);
$cwd = $pathtess;
$command = "tesseract ".$path."strip1.jpg" ".$path."ocr -l eng";

$process = proc_open($command, $descriptors, $pipes, $cwd);

if(is_resource($process)) {
    fclose($pipes[0]);
    fclose($pipes[1]);
    fclose($pipes[2]);
    proc_close($process);
}

echo file_get_contents($path."ocr.txt");

Thanks for your support! brgds David

谢谢你的支持!brgds大卫

#1


0  

Perhaps the missing environment variables in PHP is the problem here. Have a look at my question here to see if setting HOME or PATH sorts this out?

也许PHP中缺少的环境变量是这里的问题。看看我的问题,看看设置HOME还是PATH ?

#2


1  

Halfer, you made my day:-)

霍弗,你使我的一天:

Not exactly the way as described in your post but like this:

不像你的文章中描述的那样:

$path = str_replace("index.php", "../".$pathfile, $_SERVER['SCRIPT_FILENAME']);

$descriptors = array(
   0 => array("pipe", "r"),
   1 => array("pipe", "w"),
   2 => array("pipe", "w")
);
$cwd = $pathtess;
$command = "tesseract ".$path."strip1.jpg" ".$path."ocr -l eng";

$process = proc_open($command, $descriptors, $pipes, $cwd);

if(is_resource($process)) {
    fclose($pipes[0]);
    fclose($pipes[1]);
    fclose($pipes[2]);
    proc_close($process);
}

echo file_get_contents($path."ocr.txt");

Thanks for your support! brgds David

谢谢你的支持!brgds大卫