如何使用iText java在PDF中读取表格?

时间:2021-05-23 22:22:08

I dont have much idea on pdf processing using java.I want to read a table in a PDF file using the iText java library. How to proceed?

对于使用java进行pdf处理,我没有太多的想法。我想使用iText java库在PDF文件中读取一个表。如何进行呢?

3 个解决方案

#1


6  

You can extract text from a content stream, but for ordinary PDFs, the result will be plain text (without any structure). If there's a table on the page, that table won't be recognized as such. You'll get the content and some white space, but that's not a tabular structure! Only if you have a tagged PDF, you can obtain an XML-file. If the PDF contains tags that are recognized as table tags, this will be reflected in the PDF.

您可以从内容流中提取文本,但是对于普通的pdf文件,结果将是纯文本(没有任何结构)。如果页面上有一个表,那么该表就不会被识别。你会得到内容和一些空白,但那不是表格结构!只有当您有一个标记的PDF时,您才能获得一个xml文件。如果PDF包含被识别为表标记的标记,那么这将在PDF中反映出来。

That's what I found out here

这就是我在这里发现的。

#2


1  

For reading content of the table from a PDF file, you only have to convert the PDF into a text file by using any API (I have used PdfTextExtracter.getTextFromPage() of iText) and then read that txt file by your Java program. After reading it the major task is done. You have to filter the data that you need, which you can do by continuously using split method of String class until you find the record you want.

为了从PDF文件中读取表的内容,只需使用任何API将PDF转换成文本文件(我使用了PdfTextExtracter.getTextFromPage()),然后通过Java程序读取该txt文件。读完之后,主要任务就完成了。您必须过滤您需要的数据,您可以通过连续使用String类的split方法来完成,直到找到您想要的记录为止。

Below is my code in which I have extracted part of the record from a PDF file and written it into a .CSV file. You can view the PDF file here: http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf

下面是我从PDF文件中提取部分记录并将其写入. csv文件中的代码。你可以在这里查看PDF文件:http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf。

public static void genrateCsvMonth_Region(String pdfpath, String csvpath) {
        try {
            String line = null;
            // Appending Header in CSV file...
            BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath,
                    true));
            writer1.close();
            // Checking whether file is empty or not..
            BufferedReader br = new BufferedReader(new FileReader(csvpath));
                         if ((line = br.readLine()) == null) {
                BufferedWriter writer = new BufferedWriter(new FileWriter(
                        csvpath, true));
                writer.append("REGION,");
                writer.append("YEAR,");
                writer.append("MONTH,");
                writer.append("THERMAL,");
                writer.append("NUCLEAR,");
                writer.append("HYDRO,");
                writer.append("TOTAL\n");
                writer.close();
            }
            // Reading the pdf file..
            PdfReader reader = new PdfReader(pdfpath);
            BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath,
                    true));

            // Extracting records from page into String..
            String page = PdfTextExtractor.getTextFromPage(reader, 1);
            // Extracting month and Year from String..
            String period1[] = page.split("PEROID");
            String period2[] = period1[0].split(":");
            String month[] = period2[1].split("-");
            String period3[] = month[1].split("ENERGY");
            String year[] = period3[0].split("VIS");

            // Extracting Northen region
            String northen[] = page.split("NORTHEN REGION");
            String nthermal1[] = northen[0].split("THERMAL");
            String nthermal2[] = nthermal1[1].split(" ");

            String nnuclear1[] = northen[0].split("NUCLEAR");
            String nnuclear2[] = nnuclear1[1].split(" ");

            String nhydro1[] = northen[0].split("HYDRO");
            String nhydro2[] = nhydro1[1].split(" ");

            String ntotal1[] = northen[0].split("TOTAL");
            String ntotal2[] = ntotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("NORTHEN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nthermal2[4] + ",");
            writer.append(nnuclear2[4] + ",");
            writer.append(nhydro2[4] + ",");
            writer.append(ntotal2[4] + "\n");

            // Extracting Western region
            String western[] = page.split("WESTERN");

            String wthermal1[] = western[1].split("THERMAL");
            String wthermal2[] = wthermal1[1].split(" ");

            String wnuclear1[] = western[1].split("NUCLEAR");
            String wnuclear2[] = wnuclear1[1].split(" ");

            String whydro1[] = western[1].split("HYDRO");
            String whydro2[] = whydro1[1].split(" ");

            String wtotal1[] = western[1].split("TOTAL");
            String wtotal2[] = wtotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("WESTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(wthermal2[4] + ",");
            writer.append(wnuclear2[4] + ",");
            writer.append(whydro2[4] + ",");
            writer.append(wtotal2[4] + "\n");

            // Extracting Southern Region
            String southern[] = page.split("SOUTHERN");

            String sthermal1[] = southern[1].split("THERMAL");
            String sthermal2[] = sthermal1[1].split(" ");

            String snuclear1[] = southern[1].split("NUCLEAR");
            String snuclear2[] = snuclear1[1].split(" ");

            String shydro1[] = southern[1].split("HYDRO");
            String shydro2[] = shydro1[1].split(" ");

            String stotal1[] = southern[1].split("TOTAL");
            String stotal2[] = stotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("SOUTHERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(sthermal2[4] + ",");
            writer.append(snuclear2[4] + ",");
            writer.append(shydro2[4] + ",");
            writer.append(stotal2[4] + "\n");

            // Extracting eastern region
            String eastern[] = page.split("EASTERN");

            String ethermal1[] = eastern[1].split("THERMAL");
            String ethermal2[] = ethermal1[1].split(" ");

            String ehydro1[] = eastern[1].split("HYDRO");
            String ehydro2[] = ehydro1[1].split(" ");

            String etotal1[] = eastern[1].split("TOTAL");
            String etotal2[] = etotal1[1].split(" ");
            // Appending filtered data into CSV file..
            writer.append("EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(ethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(ehydro2[4] + ",");
            writer.append(etotal2[4] + "\n");

            // Extracting northernEastern region
            String neestern[] = page.split("NORTH");

            String nethermal1[] = neestern[2].split("THERMAL");
            String nethermal2[] = nethermal1[1].split(" ");

            String nehydro1[] = neestern[2].split("HYDRO");
            String nehydro2[] = nehydro1[1].split(" ");

            String netotal1[] = neestern[2].split("TOTAL");
            String netotal2[] = netotal1[1].split(" ");

            writer.append("NORTH EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(nehydro2[4] + ",");
            writer.append(netotal2[4] + "\n");
            writer.close();

        } catch (IOException ioe) {
            ioe.printStackTrace();
        }

    }

#3


-3  

My Solution

我的解决方案

package com.geek.tutorial.itext.table;
import java.io.FileOutputStream;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;

public class SimplePDFTable
{
    public SimplePDFTable() throws Exception
    {
        Document document = new Document();
        PdfWriter.getInstance(document, 
            new FileOutputStream("SimplePDFTable.pdf"));
        document.open();
        PdfPTable table = new PdfPTable(2); // Code 1
        // Code 2
        table.addCell("1");
        table.addCell("2");
        // Code 3
        table.addCell("3");
        table.addCell("4");
        // Code 4
        table.addCell("5");
        table.addCell("6");
        // Code 5
        document.add(table);        
        document.close();
    }

    public static void main(String[] args)
    {    
        try
        {
            SimplePDFTable pdfTable = new SimplePDFTable();
        }
        catch(Exception e)
        {
            System.out.println(e);
        }
    }
}

#1


6  

You can extract text from a content stream, but for ordinary PDFs, the result will be plain text (without any structure). If there's a table on the page, that table won't be recognized as such. You'll get the content and some white space, but that's not a tabular structure! Only if you have a tagged PDF, you can obtain an XML-file. If the PDF contains tags that are recognized as table tags, this will be reflected in the PDF.

您可以从内容流中提取文本,但是对于普通的pdf文件,结果将是纯文本(没有任何结构)。如果页面上有一个表,那么该表就不会被识别。你会得到内容和一些空白,但那不是表格结构!只有当您有一个标记的PDF时,您才能获得一个xml文件。如果PDF包含被识别为表标记的标记,那么这将在PDF中反映出来。

That's what I found out here

这就是我在这里发现的。

#2


1  

For reading content of the table from a PDF file, you only have to convert the PDF into a text file by using any API (I have used PdfTextExtracter.getTextFromPage() of iText) and then read that txt file by your Java program. After reading it the major task is done. You have to filter the data that you need, which you can do by continuously using split method of String class until you find the record you want.

为了从PDF文件中读取表的内容,只需使用任何API将PDF转换成文本文件(我使用了PdfTextExtracter.getTextFromPage()),然后通过Java程序读取该txt文件。读完之后,主要任务就完成了。您必须过滤您需要的数据,您可以通过连续使用String类的split方法来完成,直到找到您想要的记录为止。

Below is my code in which I have extracted part of the record from a PDF file and written it into a .CSV file. You can view the PDF file here: http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf

下面是我从PDF文件中提取部分记录并将其写入. csv文件中的代码。你可以在这里查看PDF文件:http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf。

public static void genrateCsvMonth_Region(String pdfpath, String csvpath) {
        try {
            String line = null;
            // Appending Header in CSV file...
            BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath,
                    true));
            writer1.close();
            // Checking whether file is empty or not..
            BufferedReader br = new BufferedReader(new FileReader(csvpath));
                         if ((line = br.readLine()) == null) {
                BufferedWriter writer = new BufferedWriter(new FileWriter(
                        csvpath, true));
                writer.append("REGION,");
                writer.append("YEAR,");
                writer.append("MONTH,");
                writer.append("THERMAL,");
                writer.append("NUCLEAR,");
                writer.append("HYDRO,");
                writer.append("TOTAL\n");
                writer.close();
            }
            // Reading the pdf file..
            PdfReader reader = new PdfReader(pdfpath);
            BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath,
                    true));

            // Extracting records from page into String..
            String page = PdfTextExtractor.getTextFromPage(reader, 1);
            // Extracting month and Year from String..
            String period1[] = page.split("PEROID");
            String period2[] = period1[0].split(":");
            String month[] = period2[1].split("-");
            String period3[] = month[1].split("ENERGY");
            String year[] = period3[0].split("VIS");

            // Extracting Northen region
            String northen[] = page.split("NORTHEN REGION");
            String nthermal1[] = northen[0].split("THERMAL");
            String nthermal2[] = nthermal1[1].split(" ");

            String nnuclear1[] = northen[0].split("NUCLEAR");
            String nnuclear2[] = nnuclear1[1].split(" ");

            String nhydro1[] = northen[0].split("HYDRO");
            String nhydro2[] = nhydro1[1].split(" ");

            String ntotal1[] = northen[0].split("TOTAL");
            String ntotal2[] = ntotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("NORTHEN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nthermal2[4] + ",");
            writer.append(nnuclear2[4] + ",");
            writer.append(nhydro2[4] + ",");
            writer.append(ntotal2[4] + "\n");

            // Extracting Western region
            String western[] = page.split("WESTERN");

            String wthermal1[] = western[1].split("THERMAL");
            String wthermal2[] = wthermal1[1].split(" ");

            String wnuclear1[] = western[1].split("NUCLEAR");
            String wnuclear2[] = wnuclear1[1].split(" ");

            String whydro1[] = western[1].split("HYDRO");
            String whydro2[] = whydro1[1].split(" ");

            String wtotal1[] = western[1].split("TOTAL");
            String wtotal2[] = wtotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("WESTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(wthermal2[4] + ",");
            writer.append(wnuclear2[4] + ",");
            writer.append(whydro2[4] + ",");
            writer.append(wtotal2[4] + "\n");

            // Extracting Southern Region
            String southern[] = page.split("SOUTHERN");

            String sthermal1[] = southern[1].split("THERMAL");
            String sthermal2[] = sthermal1[1].split(" ");

            String snuclear1[] = southern[1].split("NUCLEAR");
            String snuclear2[] = snuclear1[1].split(" ");

            String shydro1[] = southern[1].split("HYDRO");
            String shydro2[] = shydro1[1].split(" ");

            String stotal1[] = southern[1].split("TOTAL");
            String stotal2[] = stotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("SOUTHERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(sthermal2[4] + ",");
            writer.append(snuclear2[4] + ",");
            writer.append(shydro2[4] + ",");
            writer.append(stotal2[4] + "\n");

            // Extracting eastern region
            String eastern[] = page.split("EASTERN");

            String ethermal1[] = eastern[1].split("THERMAL");
            String ethermal2[] = ethermal1[1].split(" ");

            String ehydro1[] = eastern[1].split("HYDRO");
            String ehydro2[] = ehydro1[1].split(" ");

            String etotal1[] = eastern[1].split("TOTAL");
            String etotal2[] = etotal1[1].split(" ");
            // Appending filtered data into CSV file..
            writer.append("EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(ethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(ehydro2[4] + ",");
            writer.append(etotal2[4] + "\n");

            // Extracting northernEastern region
            String neestern[] = page.split("NORTH");

            String nethermal1[] = neestern[2].split("THERMAL");
            String nethermal2[] = nethermal1[1].split(" ");

            String nehydro1[] = neestern[2].split("HYDRO");
            String nehydro2[] = nehydro1[1].split(" ");

            String netotal1[] = neestern[2].split("TOTAL");
            String netotal2[] = netotal1[1].split(" ");

            writer.append("NORTH EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(nehydro2[4] + ",");
            writer.append(netotal2[4] + "\n");
            writer.close();

        } catch (IOException ioe) {
            ioe.printStackTrace();
        }

    }

#3


-3  

My Solution

我的解决方案

package com.geek.tutorial.itext.table;
import java.io.FileOutputStream;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;

public class SimplePDFTable
{
    public SimplePDFTable() throws Exception
    {
        Document document = new Document();
        PdfWriter.getInstance(document, 
            new FileOutputStream("SimplePDFTable.pdf"));
        document.open();
        PdfPTable table = new PdfPTable(2); // Code 1
        // Code 2
        table.addCell("1");
        table.addCell("2");
        // Code 3
        table.addCell("3");
        table.addCell("4");
        // Code 4
        table.addCell("5");
        table.addCell("6");
        // Code 5
        document.add(table);        
        document.close();
    }

    public static void main(String[] args)
    {    
        try
        {
            SimplePDFTable pdfTable = new SimplePDFTable();
        }
        catch(Exception e)
        {
            System.out.println(e);
        }
    }
}