PDF:如何优化Filesize和转换为PNG(嵌入式字体问题)

时间:2021-11-19 09:00:44

I have a PDF with embedded fonts that I can't seem to work with. Right now, I'm using GhostScript and trying to do 2 things:

我有一个内嵌字体的PDF,我似乎无法使用它。现在,我正在使用GhostScript并尝试做两件事:

  • Minimize filesize of PDF:

    最小化的PDF文件大小:

    gswin32c -dSAFER -dBATCH -dNOPAUSE -dQUIET -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pdf

    gswin32c -dBATCH -dNOPAUSE -dQUIET -sDEVICE=pdfwrite -sOutputFile=输出。pdf input.pdf

  • Convert PDF to PNG (super sample, to be used for creating other thumbnails):

    将PDF转换为PNG(超级样本,用于创建其他缩略图):

    gswin32c -dSAFER -dBATCH -dNOPAUSE -dQUIET -dFirstPage=1 -dLastPage=1 -r288 -sDEVICE=png16m -sOutputFile=output.pdf input.pdf

    gswin32c - dsecure -dBATCH -dNOPAUSE -dQUIET -dFirstPage=1 -dLastPage=1 -r288 -sDEVICE=png16m -sOutputFile=输出。pdf input.pdf

The above works well when working on scanned documents. But when I run them against PDFs with embedded fonts (the PDF is generated on the fly by an application), it fails. Here's the error I get:

在处理扫描文件时,上述方法很有效。但是当我使用嵌入字体的PDF (PDF是由应用程序动态生成的)运行它们时,它失败了。这是我的错误:

GPL Ghostscript 8.71: Warning: 'loca' length 274 is greater than numGlyphs 136 i
n the font UUGHDE+ArialMT.
GPL Ghostscript 8.71: Warning: 'loca' length 274 is greater than numGlyphs 136 i
n the font UUGHDE+ArialMT.
GPL Ghostscript 8.71: Warning: 'loca' length 188 is greater than numGlyphs 93 in
 the font UUGHDE+Arial-BoldMT.
GPL Ghostscript 8.71: Warning: 'loca' length 188 is greater than numGlyphs 93 in
 the font UUGHDE+Arial-BoldMT.

Aside from GhostScript, I also have access to PDFTK and ImageMagick (which might be replaced with GraphicsMagick). I'm also open to other solutions.

除了GhostScript,我还可以访问PDFTK和ImageMagick(可以用GraphicsMagick替换)。我也愿意接受其他的解决方案。

Development is on WAMP. Deployment is to LAMP.

发展是在里面。部署是灯。

Suggestions?

建议吗?

1 个解决方案

#1


3  

The fonts used inside your PDFs seem to be OpenType fonts. The software that created these PDFs seems to have subsetted the fonts. During font embedding and subsetting by this software (which "generates the PDFs on the fly" -- was it also Ghostscript?!?), there seems to have occurred a problem that made it to not comply 100% with the specification.

pdf中使用的字体似乎是OpenType字体。创建这些PDFs的软件似乎已经对字体进行了细分。在这个软件的字体嵌入和子设置(它“动态生成pdf”——它也是Ghostscript吗?!?)期间,似乎出现了一个问题,使它不能100%遵守规范。

'loca' tables are part of OpenType Font descriptions. They represent an index to all glyph locations.

“loca”表是OpenType字体描述的一部分。它们表示所有字形位置的索引。

Now you process these not completely 'kosher' PDFs with Ghostscript. Ghostscript gives out warnings, but no errors.

现在,您可以使用Ghostscript来处理这些不完全“犹太”的pdf文件。Ghostscript发出警告,但没有错误。

GS errors usually mean: "I'll abort further processing. I can't work around a problem or repair this corrupt file. Should I have written output files already, they will be useless."

GS错误通常意味着:“我将中止进一步的处理。”我无法解决问题或修复这个损坏的文件。如果我已经写了输出文件,它们将是无用的。

GS warnings usually mean: "I've encountered a problem. But I'll continue to process the input and work around it. I've written a valid output file. But you better check it, especially its fidelity!"

GS警告通常意味着:“我遇到了一个问题。但是我将继续处理输入并围绕它工作。我已经编写了一个有效的输出文件。但你最好检查一下,尤其是它的保真度!

The warnings (not errors!) you see mean this:

你看到的警告(不是错误!)意味着:

  1. One of the subsetted fonts in question claims the number of glyphs to be 188 according to the table.
  2. 其中一种被质疑的字体声称,根据表格,字形的数目为188。
  3. But in reality the actual font description contains only definitions for 93 glyph shapes.
  4. 但实际上,实际的字体描述只包含93个字形的定义。

#1


3  

The fonts used inside your PDFs seem to be OpenType fonts. The software that created these PDFs seems to have subsetted the fonts. During font embedding and subsetting by this software (which "generates the PDFs on the fly" -- was it also Ghostscript?!?), there seems to have occurred a problem that made it to not comply 100% with the specification.

pdf中使用的字体似乎是OpenType字体。创建这些PDFs的软件似乎已经对字体进行了细分。在这个软件的字体嵌入和子设置(它“动态生成pdf”——它也是Ghostscript吗?!?)期间,似乎出现了一个问题,使它不能100%遵守规范。

'loca' tables are part of OpenType Font descriptions. They represent an index to all glyph locations.

“loca”表是OpenType字体描述的一部分。它们表示所有字形位置的索引。

Now you process these not completely 'kosher' PDFs with Ghostscript. Ghostscript gives out warnings, but no errors.

现在,您可以使用Ghostscript来处理这些不完全“犹太”的pdf文件。Ghostscript发出警告,但没有错误。

GS errors usually mean: "I'll abort further processing. I can't work around a problem or repair this corrupt file. Should I have written output files already, they will be useless."

GS错误通常意味着:“我将中止进一步的处理。”我无法解决问题或修复这个损坏的文件。如果我已经写了输出文件,它们将是无用的。

GS warnings usually mean: "I've encountered a problem. But I'll continue to process the input and work around it. I've written a valid output file. But you better check it, especially its fidelity!"

GS警告通常意味着:“我遇到了一个问题。但是我将继续处理输入并围绕它工作。我已经编写了一个有效的输出文件。但你最好检查一下,尤其是它的保真度!

The warnings (not errors!) you see mean this:

你看到的警告(不是错误!)意味着:

  1. One of the subsetted fonts in question claims the number of glyphs to be 188 according to the table.
  2. 其中一种被质疑的字体声称,根据表格,字形的数目为188。
  3. But in reality the actual font description contains only definitions for 93 glyph shapes.
  4. 但实际上,实际的字体描述只包含93个字形的定义。