如何从PDF中删除除位图之外的所有内容?

时间:2022-05-05 15:05:40

In How can I remove all images from a PDF?, Kurt Pfeifle gave a piece of PostScript code (by courtesy of Chris Liddell) to filter out all bitmaps from a PDF, using GhostScript.

在如何从PDF中删除所有图像?,Kurt Pfeifle提供了一段PostScript代码(由Chris Liddell提供),使用GhostScript从PDF中过滤掉所有位图。

This works like a charm; however, I'm also interested in the companion task of removing everything except bitmaps from the PDF, and without recompressing bitmaps. Or, eventually, separating the vector and bitmap "layers." (I know, this is not what a layer is in PDF terminology.)

这就像一个魅力;但是,我也有兴趣从PDF中删除除位图之外的所有内容,而无需重新压缩位图。或者,最终将矢量和位图“层”分开。 (我知道,这不是PDF术语中的图层。)

AFAIU, Kurt's filter works by sending all bitmaps to a null device, while leaving everything else to pdfwrite. I read that it is possible to use different devices with GS, so my hope is that it is possible to send everything to a fake/null device by default, and only switch to pdfwrite for those images which are captured by the filter. But unfortunately I'm completely unable to translate such a thing into PostScript code.

AFAIU,Kurt的过滤器通过将所有位图发送到空设备,同时将其他所有内容保留为pdfwrite。我读到可以使用不同的设备与GS,所以我希望默认情况下可以将所有内容发送到假/空设备,并且只对过滤器捕获的那些图像切换到pdfwrite。但不幸的是,我完全无法将这样的东西翻译成PostScript代码。

Can anyone help, or at least tell me if this approach might be doomed to fail?

任何人都可以帮忙,或者至少告诉我这种方法注定会失败吗?

1 个解决方案

#1


Its possible, but its a large amount of work.

它可能,但它的工作量很大。

You can't start with the nulldevice and push the pdfwrite device as needed, that simply won't work because the pdfwrite device will write out the accumulated PDF file as soon as you unload it. Reloadng it will start a new PDF file.

您无法从null设备开始并根据需要推送pdfwrite设备,这根本不起作用,因为pdfwrite设备会在您卸载它时立即写出累积的PDF文件。重新加载它将启动一个新的PDF文件。

Also, you need the same instance of the pdfwrite device for all the code, so you can't load the pdfwrite device, load the nulldevice, then load the pdfwrite device again only for the bits you want. Which means the only approach which (currently) works is the one which Chris wrote. You need to load pdfwrite and push the null device into place whenever you want to silently consume an operation.

此外,您需要为所有代码使用相同的pdfwrite设备实例,因此您无法加载pdfwrite设备,加载null设备,然后仅为您想要的位重新加载pdfwrite设备。这意味着(目前)工作的唯一方法是克里斯写的那个。您需要加载pdfwrite并在需要静默使用操作时将空设备推送到位。

Just 'images' is quite a limited amount of change, because there aren't that many operators which deal with images.

只是'图像'是一个非常有限的变化,因为没有那么多的运营商处理图像。

In order to remove everything except images however, there are a lot of operators. You need to override; stroke, fill, eofill, rectstroke, rectfill, ustroke, ufill, ueofill, shfill, show, ashow, widthshow, awidthshow, xshow, xyshow, yshow, glyphshow, cshow and kshow. I might have missed a few operators but those are the basics at least.

但是为了删除除图像之外的所有内容,有很多运算符。你需要覆盖;笔画,填充,eofill,rectstroke,rectfill,ustroke,ufill,ueofill,shfill,show,ashow,widthshow,awidthshow,xshow,xyshow,yshow,glyphshow,cshow和kshow。我可能错过了一些运营商,但这些至少是基础。

Note that the code Chris originally posted did actually filter various types of objects, not just images, you can find his code here:

请注意,Chris最初发布的代码确实过滤了各种类型的对象,而不仅仅是图像,您可以在此处找到他的代码:

http://www.ghostscript.com/~chrisl/filter-obs.ps

Please be aware this is unsupported example code only.

请注意,这仅是不受支持的示例代码。

#1


Its possible, but its a large amount of work.

它可能,但它的工作量很大。

You can't start with the nulldevice and push the pdfwrite device as needed, that simply won't work because the pdfwrite device will write out the accumulated PDF file as soon as you unload it. Reloadng it will start a new PDF file.

您无法从null设备开始并根据需要推送pdfwrite设备,这根本不起作用,因为pdfwrite设备会在您卸载它时立即写出累积的PDF文件。重新加载它将启动一个新的PDF文件。

Also, you need the same instance of the pdfwrite device for all the code, so you can't load the pdfwrite device, load the nulldevice, then load the pdfwrite device again only for the bits you want. Which means the only approach which (currently) works is the one which Chris wrote. You need to load pdfwrite and push the null device into place whenever you want to silently consume an operation.

此外,您需要为所有代码使用相同的pdfwrite设备实例,因此您无法加载pdfwrite设备,加载null设备,然后仅为您想要的位重新加载pdfwrite设备。这意味着(目前)工作的唯一方法是克里斯写的那个。您需要加载pdfwrite并在需要静默使用操作时将空设备推送到位。

Just 'images' is quite a limited amount of change, because there aren't that many operators which deal with images.

只是'图像'是一个非常有限的变化,因为没有那么多的运营商处理图像。

In order to remove everything except images however, there are a lot of operators. You need to override; stroke, fill, eofill, rectstroke, rectfill, ustroke, ufill, ueofill, shfill, show, ashow, widthshow, awidthshow, xshow, xyshow, yshow, glyphshow, cshow and kshow. I might have missed a few operators but those are the basics at least.

但是为了删除除图像之外的所有内容,有很多运算符。你需要覆盖;笔画,填充,eofill,rectstroke,rectfill,ustroke,ufill,ueofill,shfill,show,ashow,widthshow,awidthshow,xshow,xyshow,yshow,glyphshow,cshow和kshow。我可能错过了一些运营商,但这些至少是基础。

Note that the code Chris originally posted did actually filter various types of objects, not just images, you can find his code here:

请注意,Chris最初发布的代码确实过滤了各种类型的对象,而不仅仅是图像,您可以在此处找到他的代码:

http://www.ghostscript.com/~chrisl/filter-obs.ps

Please be aware this is unsupported example code only.

请注意,这仅是不受支持的示例代码。