方法一、Acrobat Scripting
var doc = app.openDoc(\'/c/test.pdf\');
doc.saveAs("p.jpg", "com.adobe.acrobat.jpeg");
注意:此方法极其简单,而且快速,但生成的图片命名格式只能为:test_Page_01.jpg这种格式,详细请参见Acobe Acrobat javascript Development
方法二、利用.net来生成图片,代码如下:
[STAThread] public static void Main(string[] args) { if (args.Length != 1) { Console.WriteLine("命令行格式:Pdf2Image <Pdf文件路径>"); return; } string pdfFilePath = args[0]; if(!System.IO.File.Exists(pdfFilePath)) { Console.WriteLine("文件\"{0}\"不存在", pdfFilePath); return; } FileInfo pdfFi = new FileInfo(pdfFilePath); pdfFilePath = pdfFi.FullName; string imageDirectoryPath = System.IO.Path.Combine( pdfFi.DirectoryName, pdfFi.Name.Replace(pdfFi.Extension,"")); try { ConvertPdf2Image(pdfFilePath, imageDirectoryPath, 0, 0, null, 1); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } Console.Read(); } public static void ConvertPdf2Image(string pdfFilePath, string imageDirectoryPath, int beginPageNum, int endPageNum, ImageFormat format, double zoom = 1) { Acrobat.CAcroPDDoc pdfDoc = null; Acrobat.CAcroPDPage pdfPage = null; Acrobat.CAcroRect pdfRect = null; Acrobat.CAcroPoint pdfPoint = null; //生成操作Pdf文件的Com对象 pdfDoc = (Acrobat.CAcroPDDoc)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.PDDoc", ""); //检查输入参数 if (!pdfDoc.Open(pdfFilePath)) { throw new FileNotFoundException(string.Format("源文件{0}不存在!", pdfFilePath)); } if (!Directory.Exists(imageDirectoryPath)) { Directory.CreateDirectory(imageDirectoryPath); } if (beginPageNum <= 0) { beginPageNum = 1; } if (endPageNum > pdfDoc.GetNumPages() || endPageNum <= 0) { endPageNum = pdfDoc.GetNumPages(); } if (beginPageNum > endPageNum) { throw new ArgumentException("参数\"beginPageNum\"必须小于\"endPageNum\"!"); } if (format == null) { format = ImageFormat.Png; } if (zoom <= 0) { zoom = 1; } //转换 for (int i = beginPageNum; i <= endPageNum; i++) { //取出当前页 pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(i - 1); //得到当前页的大小 pdfPoint = (Acrobat.CAcroPoint)pdfPage.GetSize(); //生成一个页的裁剪区矩形对象 pdfRect = (Acrobat.CAcroRect)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.Rect", ""); //计算当前页经缩放后的实际宽度和高度,zoom==1时,保持原比例大小 int imgWidth = (int)((double)pdfPoint.x * zoom); int imgHeight = (int)((double)pdfPoint.y * zoom); //设置裁剪矩形的大小为当前页的大小 pdfRect.Left = 0; pdfRect.right = (short)imgWidth; pdfRect.Top = 0; pdfRect.bottom = (short)imgHeight; //将当前页的裁剪区的内容编成图片后复制到剪贴板中 pdfPage.CopyToClipboard(pdfRect, 0, 0, (short)(100 * zoom)); IDataObject clipboardData = Clipboard.GetDataObject(); //检查剪贴板中的对象是否是图片,如果是图片则将其保存为指定格式的图片文件 if (clipboardData.GetDataPresent(DataFormats.Bitmap)) { Bitmap pdfBitmap = (Bitmap)clipboardData.GetData(DataFormats.Bitmap); pdfBitmap.Save( Path.Combine(imageDirectoryPath, i.ToString("0000") + "." + format.ToString()), format); pdfBitmap.Dispose(); } } //关闭和释放相关COM对象 pdfDoc.Close(); Marshal.ReleaseComObject(pdfRect); Marshal.ReleaseComObject(pdfPoint); Marshal.ReleaseComObject(pdfPage); Marshal.ReleaseComObject(pdfDoc); }
注意:此方法同样需要借助Acrobat,需要在c#工程中引入com+( Acrobat).并且引入Microsoft.Visual.Basic.dll(这个在本机的Microsoft.NET/Framework/V...下查找,而且此方法在更换服务器时需要调整Dcomconfig,用来修改Activex的权限。
方法三:利用Apache PDFBOX,Java代码片段如下:
PDDocument document = PDDocument.load(“C:/test.pdf“); PDFImageWriter imageWriter = new PDFImageWriter(); imageWriter.writeImage(document,"jpg","",1,1,“p1“); document.close();
注意:此方法用得最多,但垢病也是最多,在生几百页或上千页的大PDF时出现错误,需要修改pdfbox的源码。
方法四:利用PDFRenderer,Java代码片段如下:
首先去下载PDFRenderer 的jar包,然后引入到java工程中。
下载地址:https://java.net/projects/pdf-renderer/downloads
import com.sun.pdfview.PDFFile; import com.sun.pdfview.PDFPage; import java.awt.Graphics; import java.awt.Image; import java.awt.Rectangle; import java.io.*; import java.nio.ByteBuffer; import java.nio.channels.*; import javax.imageio.*; import java.awt.image.*; publicvoid convert() throws Exception { //装载PDF File file = new File("c:/test.pdf"); RandomAccessFile raf = new RandomAccessFile(file, "r"); FileChannel channel = raf.getChannel(); ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY,0, channel.size()); PDFFile pdffile = new PDFFile(buf); //获取PDF页码 int jumlahhalaman = pdffile.getNumPages(); //遍历生成单页图片 for (int i = 1; i <= jumlahhalaman; i++) { PDFPage page = pdffile.getPage(i); //创建图片 Rectangle rect = new Rectangle(0, 0,(int) page.getWidth(),(int) page.getHeight()); Image img = page.getImage( rect.width, rect.height, //width & height rect, // clip rect null, // null for the ImageObserver true, // fill background with white true// block until drawing is done ); BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB); Graphics g = bufferedImage.createGraphics(); g.drawImage(img, 0, 0, null); g.dispose(); File asd = new File("c:/p" + i + ".jpg"); if (asd.exists()) { asd.delete(); } ImageIO.write(bufferedImage, "jpg", asd); } }
国外有不少PDF处理案例,读者也可去参考,例如:Jpedal Pdf Library,不但生成可以image,而且可以将PDF完美生成html,生成文字坐标等。达到网页快速浏览PDF的效果。当然pdfbox里也有读取pdf文字大小坐标,并且也可以处理了成html,但效果相比Jpedal要差很多。