PDF转图片的各种方法

时间:2024-02-24 08:42:08

方法一、Acrobat Scripting  

    var doc = app.openDoc(\'/c/test.pdf\');

    doc.saveAs("p.jpg", "com.adobe.acrobat.jpeg");

    注意:此方法极其简单,而且快速,但生成的图片命名格式只能为:test_Page_01.jpg这种格式,详细请参见Acobe Acrobat javascript Development 

 

方法二、利用.net来生成图片,代码如下:

        

[STAThread]
        public static void Main(string[] args) {

            if (args.Length != 1) {
                Console.WriteLine("命令行格式:Pdf2Image <Pdf文件路径>");
                return;
            }

            string pdfFilePath = args[0];
            if(!System.IO.File.Exists(pdfFilePath))
            {
                Console.WriteLine("文件\"{0}\"不存在", pdfFilePath);
                return;
            }

            FileInfo pdfFi = new FileInfo(pdfFilePath);
            pdfFilePath = pdfFi.FullName;

            string imageDirectoryPath = System.IO.Path.Combine(
                pdfFi.DirectoryName, 
                pdfFi.Name.Replace(pdfFi.Extension,""));

            try {

                ConvertPdf2Image(pdfFilePath, imageDirectoryPath, 0, 0, null, 1);
            }
            catch (Exception ex) {
                Console.WriteLine(ex.ToString());
            }

            Console.Read();
        }

        public static void ConvertPdf2Image(string pdfFilePath, string imageDirectoryPath,
            int beginPageNum, int endPageNum, ImageFormat format, double zoom = 1) {
            Acrobat.CAcroPDDoc pdfDoc = null;
            Acrobat.CAcroPDPage pdfPage = null;
            Acrobat.CAcroRect pdfRect = null;
            Acrobat.CAcroPoint pdfPoint = null;

            //生成操作Pdf文件的Com对象
            pdfDoc = (Acrobat.CAcroPDDoc)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.PDDoc", "");

            //检查输入参数
            if (!pdfDoc.Open(pdfFilePath)) {
                throw new FileNotFoundException(string.Format("源文件{0}不存在!", pdfFilePath));
            }

            if (!Directory.Exists(imageDirectoryPath)) {
                Directory.CreateDirectory(imageDirectoryPath);
            }

            if (beginPageNum <= 0) {
                beginPageNum = 1;
            }

            if (endPageNum > pdfDoc.GetNumPages() || endPageNum <= 0) {
                endPageNum = pdfDoc.GetNumPages();
            }

            if (beginPageNum > endPageNum) {
                throw new ArgumentException("参数\"beginPageNum\"必须小于\"endPageNum\"!");
            }

            if (format == null) {
                format = ImageFormat.Png;
            }

            if (zoom <= 0) {
                zoom = 1;
            }

            //转换
            for (int i = beginPageNum; i <= endPageNum; i++) {
                //取出当前页
                pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(i - 1);
                //得到当前页的大小
                pdfPoint = (Acrobat.CAcroPoint)pdfPage.GetSize();
                //生成一个页的裁剪区矩形对象
                pdfRect = (Acrobat.CAcroRect)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.Rect", "");

                //计算当前页经缩放后的实际宽度和高度,zoom==1时,保持原比例大小
                int imgWidth = (int)((double)pdfPoint.x * zoom);
                int imgHeight = (int)((double)pdfPoint.y * zoom);

                //设置裁剪矩形的大小为当前页的大小
                pdfRect.Left = 0;
                pdfRect.right = (short)imgWidth;
                pdfRect.Top = 0;
                pdfRect.bottom = (short)imgHeight;

                //将当前页的裁剪区的内容编成图片后复制到剪贴板中
                pdfPage.CopyToClipboard(pdfRect, 0, 0, (short)(100 * zoom));

                IDataObject clipboardData = Clipboard.GetDataObject();

                //检查剪贴板中的对象是否是图片,如果是图片则将其保存为指定格式的图片文件
                if (clipboardData.GetDataPresent(DataFormats.Bitmap)) {
                    Bitmap pdfBitmap = (Bitmap)clipboardData.GetData(DataFormats.Bitmap);

                    pdfBitmap.Save(
                        Path.Combine(imageDirectoryPath, i.ToString("0000") + "." + format.ToString()), format);

                    pdfBitmap.Dispose();
                }
            }

            //关闭和释放相关COM对象
            pdfDoc.Close();
            Marshal.ReleaseComObject(pdfRect);
            Marshal.ReleaseComObject(pdfPoint);
            Marshal.ReleaseComObject(pdfPage);
            Marshal.ReleaseComObject(pdfDoc);
        }

 

注意:此方法同样需要借助Acrobat,需要在c#工程中引入com+( Acrobat).并且引入Microsoft.Visual.Basic.dll(这个在本机的Microsoft.NET/Framework/V...下查找,而且此方法在更换服务器时需要调整Dcomconfig,用来修改Activex的权限。

 

方法三:利用Apache PDFBOX,Java代码片段如下:

PDDocument document = PDDocument.load(“C:/test.pdf“); 
PDFImageWriter imageWriter = new PDFImageWriter();
imageWriter.writeImage(document,"jpg","",1,1,“p1“); 
document.close();

 

           

注意:此方法用得最多,但垢病也是最多,在生几百页或上千页的大PDF时出现错误,需要修改pdfbox的源码。

 

方法四:利用PDFRenderer,Java代码片段如下:

首先去下载PDFRenderer 的jar包,然后引入到java工程中。

下载地址:https://java.net/projects/pdf-renderer/downloads

import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

import java.awt.Graphics;
import java.awt.Image;
import java.awt.Rectangle;
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.*;
import javax.imageio.*;
import java.awt.image.*;
 
publicvoid convert() throws Exception {                       
         //装载PDF 
         File file = new File("c:/test.pdf");     
         RandomAccessFile raf = new RandomAccessFile(file, "r");     
         FileChannel channel = raf.getChannel();      
         ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY,0, channel.size());
         PDFFile pdffile = new PDFFile(buf);      
         //获取PDF页码
         int jumlahhalaman = pdffile.getNumPages();
         //遍历生成单页图片
         for (int i = 1; i <= jumlahhalaman; i++) {
             PDFPage page = pdffile.getPage(i);
             //创建图片
             Rectangle rect = new Rectangle(0, 0,(int) page.getWidth(),(int) page.getHeight());
             Image img = page.getImage(
                    rect.width, rect.height, //width & height
                    rect, // clip rect
                    null, // null for the ImageObserver
                    true, // fill background with white
                    true// block until drawing is done
             );
             BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);
             Graphics g = bufferedImage.createGraphics();
             g.drawImage(img, 0, 0, null);
             g.dispose();
             File asd = new File("c:/p" + i + ".jpg");
             if (asd.exists()) {
                 asd.delete();
             }
             ImageIO.write(bufferedImage, "jpg", asd);
             
         }
         
     } 

 

国外有不少PDF处理案例,读者也可去参考,例如:Jpedal Pdf Library,不但生成可以image,而且可以将PDF完美生成html,生成文字坐标等。达到网页快速浏览PDF的效果。当然pdfbox里也有读取pdf文字大小坐标,并且也可以处理了成html,但效果相比Jpedal要差很多。

笔者在批量处理时使用了方法四,在独立大PDF处理时,选择了方法一。全部代码完全通过调试!