使用iText获取PDF中特定页面的大小(以字节为单位)

时间:2021-05-23 22:21:44

I'm using iText (v 2.1.7) and I need to find the size, in bytes, of a specific page.

我正在使用iText(v 2.1.7),我需要找到特定页面的大小(以字节为单位)。

I've written the following code:

我写了以下代码:

public static long[] getPageSizes(byte[] input) throws IOException {
        PdfReader reader;
        reader = new PdfReader(input);
        int pageCount = reader.getNumberOfPages();
        long[] pageSizes = new long[pageCount];
        for (int i = 0; i < pageCount; i++) {
            pageSizes[i] = reader.getPageContent(i+1).length;
        }

        reader.close();
        return pageSizes;
    }

But it doesn't work properly. The reader.getPageContent(i+1).length; instruction returns very small values (<= 100 usually), even for large pages that are more than 1MB, so clearly this is not the correct way to do this.

但它不能正常工作。 reader.getPageContent(i + 1).length;指令返回非常小的值(通常<= 100),即使对于大于1MB的大页面,显然这不是正确的方法。

But what IS the correct way? Is there one?

但是正确的方法是什么?有吗?

Note: I've already checked this question, but the offered solution consists of writing each page of the PDF to disk and then checking the file size, which is extremely inefficient and may even be wrong, since I'm assuming this would repeat the PDF header and metadata each time. I was searching for a more "proper" solution.

注意:我已经检查了这个问题,但提供的解决方案包括将PDF的每一页写入磁盘,然后检查文件大小,这是非常低效甚至可能是错误的,因为我假设这会重复每次都有PDF标题和元数据。我正在寻找更“合适”的解决方案。

1 个解决方案

#1


1  

Well, in the end I managed to get hold of the source code for the original program that I was working with, which only accepted PDFs as input with a maximum "page size" of 1MB. Turns out... what it actually means by "page size" was fileSize / pageCount -_-^

好吧,最后我设法得到了我正在使用的原始程序的源代码,它只接受PDF作为输入,最大“页面大小”为1MB。事实证明......“页面大小”实际上意味着什么是fileSize / pageCount -_- ^

For anyone that actually needs the precise size of a "standalone" page, with all content included, I've tested this solution and it seems to work well, tho it probably isn't very efficient as it writes out an entire PDF document for each page. Using a memory stream instead of a disk-based one helps, but I don't know how much.

对于任何实际需要“独立”页面的精确大小的人来说,包含所有内容,我已经测试了这个解决方案,它似乎运行良好,因为它可能不是非常有效,因为它写出了整个PDF文档每页。使用内存流而不是基于磁盘的内存流有帮助,但我不知道多少。

public static int[] getPageSizes(byte[] input) throws IOException {
        PdfReader reader;
        reader = new PdfReader(input);
        int pageCount = reader.getNumberOfPages();
        int[] pageSizes = new int[pageCount];
        for (int i = 0; i < pageCount; i++) {
            try {
                Document doc = new Document();
                ByteArrayOutputStream bous = new ByteArrayOutputStream();
                PdfCopy copy= new PdfCopy(doc, bous);
                doc.open();
                PdfImportedPage page = copy.getImportedPage(reader, i+1);
                copy.addPage(page);
                doc.close();
                pageSizes[i] = bous.size();
            } catch (DocumentException e) {
                e.printStackTrace();
            }
        }

        reader.close();
        return pageSizes;
    }

#1


1  

Well, in the end I managed to get hold of the source code for the original program that I was working with, which only accepted PDFs as input with a maximum "page size" of 1MB. Turns out... what it actually means by "page size" was fileSize / pageCount -_-^

好吧,最后我设法得到了我正在使用的原始程序的源代码,它只接受PDF作为输入,最大“页面大小”为1MB。事实证明......“页面大小”实际上意味着什么是fileSize / pageCount -_- ^

For anyone that actually needs the precise size of a "standalone" page, with all content included, I've tested this solution and it seems to work well, tho it probably isn't very efficient as it writes out an entire PDF document for each page. Using a memory stream instead of a disk-based one helps, but I don't know how much.

对于任何实际需要“独立”页面的精确大小的人来说,包含所有内容,我已经测试了这个解决方案,它似乎运行良好,因为它可能不是非常有效,因为它写出了整个PDF文档每页。使用内存流而不是基于磁盘的内存流有帮助,但我不知道多少。

public static int[] getPageSizes(byte[] input) throws IOException {
        PdfReader reader;
        reader = new PdfReader(input);
        int pageCount = reader.getNumberOfPages();
        int[] pageSizes = new int[pageCount];
        for (int i = 0; i < pageCount; i++) {
            try {
                Document doc = new Document();
                ByteArrayOutputStream bous = new ByteArrayOutputStream();
                PdfCopy copy= new PdfCopy(doc, bous);
                doc.open();
                PdfImportedPage page = copy.getImportedPage(reader, i+1);
                copy.addPage(page);
                doc.close();
                pageSizes[i] = bous.size();
            } catch (DocumentException e) {
                e.printStackTrace();
            }
        }

        reader.close();
        return pageSizes;
    }