如何在多层次上使用itext读取PDF中的书签?

时间:2021-05-23 22:21:56

I am using iText-Java to split PDFs at bookmark level. Does anybody know or have any examples for splitting a PDF at bookmarks that exist at a level 2 or 3? For ex: I have the bookmarks in the following levels:

我正在使用iText-Java在书签级别拆分PDF。有没有人知道或有任何在PDF 2级或3级书签中拆分PDF的例子?例如:我有以下级别的书签:

Father
|-Son
|-Son
|-Daughter
|-|-Grand son
|-|-Grand daughter

父亲| -Son | -Son | -Daughter | - | - 大儿子| - | - 女儿

Right now I have below code to read the bookmark which reads the base bookmark(Father). Basically SimpleBookmark.getBookmark(reader) line did all the work.

现在我有下面的代码来读取读取基本书签(父亲)的书签。基本上SimpleBookmark.getBookmark(阅读器)行完成了所有工作。

But I want to read the level 2 and level 3 bookmarks to split the content present between those inner level bookmarks.

但是我想阅读2级和3级书签来分割这些内部级书签之间的内容。

public static void splitPDFByBookmarks(String pdf, String outputFolder){ 
        try
        { 
            PdfReader reader = new PdfReader(pdf); 
            //List of bookmarks: each bookmark is a map with values for title, page, etc 
            List<HashMap> bookmarks = SimpleBookmark.getBookmark(reader); 
            for(int i=0; i<bookmarks.size(); i++){ 
                HashMap bm = bookmarks.get(i); 
                HashMap nextBM = i==bookmarks.size()-1 ? null : bookmarks.get(i+1); 
                //In my case I needed to split the title string 
                String title = ((String)bm.get("Title")).split(" ")[2]; 

                log.debug("Titel: " + title); 
                String startPage = ((String)bm.get("Page")).split(" ")[0]; 
                String startPageNextBM = nextBM==null ? "" + (reader.getNumberOfPages() + 1) : ((String)nextBM.get("Page")).split(" ")[0]; 
                log.debug("Page: " + startPage); 
                log.debug("------------------"); 
                extractBookmarkToPDF(reader, Integer.valueOf(startPage), Integer.valueOf(startPageNextBM), title + ".pdf",outputFolder); 
            } 
        } 
        catch (IOException e) 
        { 
            log.error(e.getMessage()); 
        } 
    } 

    private static void extractBookmarkToPDF(PdfReader reader, int pageFrom, int pageTo, String outputName, String outputFolder){ 
        Document document = new Document(); 
        OutputStream os = null; 

        try{ 
            os = new FileOutputStream(outputFolder + outputName); 

            // Create a writer for the outputstream 
            PdfWriter writer = PdfWriter.getInstance(document, os); 
            document.open(); 
            PdfContentByte cb = writer.getDirectContent(); // Holds the PDF data 
            PdfImportedPage page; 

            while(pageFrom < pageTo) { 
                document.newPage(); 
                page = writer.getImportedPage(reader, pageFrom); 
                cb.addTemplate(page, 0, 0); 
                pageFrom++; 
            } 

            os.flush(); 
            document.close(); 
            os.close(); 
        }catch(Exception ex){ 
            log.error(ex.getMessage()); 
        }finally { 
            if (document.isOpen()) 
                document.close(); 
            try { 
                if (os != null) 
                    os.close(); 
            } catch (IOException ioe) { 
                log.error(ioe.getMessage()); 
            } 
        } 
    } 

Your help is much appreciated. Thanks in advance! :)

非常感谢您的帮助。提前致谢! :)

1 个解决方案

#1


0  

You get an ArrayList<HashMap> when you call SimpleBookmark.getBookmark(reader); (do the cast if you need it). Try to iterate through that Arraylist and see its structure. If a bookmarks have sons (as you call it), it will contains another list with the same structure.

当你调用SimpleBookmark.getBookmark(reader)时,你得到一个ArrayList ; (如果需要,可以进行演员表)。尝试迭代那个Arraylist并看看它的结构。如果书签有儿子(如你所说),它将包含另一个具有相同结构的列表。

A recursive method could be the solution.

递归方法可能是解决方案。

#1


0  

You get an ArrayList<HashMap> when you call SimpleBookmark.getBookmark(reader); (do the cast if you need it). Try to iterate through that Arraylist and see its structure. If a bookmarks have sons (as you call it), it will contains another list with the same structure.

当你调用SimpleBookmark.getBookmark(reader)时,你得到一个ArrayList ; (如果需要,可以进行演员表)。尝试迭代那个Arraylist并看看它的结构。如果书签有儿子(如你所说),它将包含另一个具有相同结构的列表。

A recursive method could be the solution.

递归方法可能是解决方案。