By using apache POI
how to convert ms word
file to pdf
?
使用apache POI如何将ms word文件转换为pdf?
I an using the following code but its not working giving errors I guess I am importing the wrong classes?
我使用了下面的代码,但它不能工作,我想我导入了错误的类?
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.hslf.record.Document;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("/document/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}
6 个解决方案
#1
8
Got It solved
把它解决了
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("D:/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}
#2
2
This worked For Me:-
这工作对我来说:-
来源:http://www.programcreek.com/java-api-examples/index.php?api=org.apache.poi.xwpf.converter.pdf.PdfConverter
package pdf;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class PDF {
public static void main(String[] args) throws Exception {
String inputFile="D:/TEST.docx";
String outputFile="D:/TEST.pdf";
if (args != null && args.length == 2) {
inputFile=args[0];
outputFile=args[1];
}
System.out.println("inputFile:" + inputFile + ",outputFile:"+ outputFile);
FileInputStream in=new FileInputStream(inputFile);
XWPFDocument document=new XWPFDocument(in);
File outFile=new File(outputFile);
OutputStream out=new FileOutputStream(outFile);
PdfOptions options=null;
PdfConverter.getInstance().convert(document,out,options);
}
}
#3
1
There are several steps here:
这里有几个步骤:
- Read Word document using POI into a format-agnostic form
- 使用POI将Word文档读入一个与表单无关的表单
- Convert format-agnostic form into PDF
- 将格式不可知的表单转换为PDF
- Write PDF
- 写PDF
I don't know if POI will do step 2 for you. I'd recommend something else, like iText.
我不知道POI是否会为你做第二步。我推荐别的东西,比如iText。
#4
1
As a side note, it's also possible to read content on-the-fly directly from a Word/Excel content stream instead of reading it from the filesystem and serializing it to disk, for example when retrieving content from CMIS repositories:
作为补充说明,也可以直接从Word/Excel内容流中读取内容,而不是从文件系统中读取内容并将其序列化到磁盘,例如从CMIS存储库中检索内容时:
e.g.
如。
//HWPFDocument docx = new HWPFDocument(fs);
HWPFDocument docx = new HWPFDocument(doc.getContentStream().getStream());
(doc is of type org.apache.chemistry.opencmis.client.api.Document
and in this case I adapted your code to retrieve a word file from an Alfresco repository by means of opencmis and transformed it to PDF)
(doc是type org.apache.chemistry.opencmis.client.api)。文档,在这个例子中,我调整了你的代码从Alfresco存储库中检索一个单词文件通过opencmis转换为PDF)
HTH
HTH
#5
1
The below code worked for me:
下面的代码对我有用:
Public class DocToPdfConverter{
public static void main(String[] args) {
String k=null;
OutputStream fileForPdf =null;
try {
String fileName="/document/test2.doc";
//Below Code is for .doc file
if(fileName.endsWith(".doc"))
{
HWPFDocument doc = new HWPFDocument(new FileInputStream(
fileName));
WordExtractor we=new WordExtractor(doc);
k = we.getText();
fileForPdf = new FileOutputStream(new File(
"/document/DocToPdf.pdf"));
we.close();
}
//Below Code for
else if(fileName.endsWith(".docx"))
{
XWPFDocument docx = new XWPFDocument(new FileInputStream(
fileName));
// using XWPFWordExtractor Class
XWPFWordExtractor we = new XWPFWordExtractor(docx);
k = we.getText();
fileForPdf = new FileOutputStream(new File(
"/document/DocxToPdf.pdf"));
we.close();
}
Document document = new Document();
PdfWriter.getInstance(document, fileForPdf);
document.open();
document.add(new Paragraph(k));
document.close();
fileForPdf.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
#6
0
In addition to Kushagra's answer, here the updated maven dependencies:
除了Kushagra的答案,这里更新的maven依赖性:
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
<version>2.0.1</version>
</dependency>
#1
8
Got It solved
把它解决了
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("D:/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}
#2
2
This worked For Me:-
这工作对我来说:-
来源:http://www.programcreek.com/java-api-examples/index.php?api=org.apache.poi.xwpf.converter.pdf.PdfConverter
package pdf;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class PDF {
public static void main(String[] args) throws Exception {
String inputFile="D:/TEST.docx";
String outputFile="D:/TEST.pdf";
if (args != null && args.length == 2) {
inputFile=args[0];
outputFile=args[1];
}
System.out.println("inputFile:" + inputFile + ",outputFile:"+ outputFile);
FileInputStream in=new FileInputStream(inputFile);
XWPFDocument document=new XWPFDocument(in);
File outFile=new File(outputFile);
OutputStream out=new FileOutputStream(outFile);
PdfOptions options=null;
PdfConverter.getInstance().convert(document,out,options);
}
}
#3
1
There are several steps here:
这里有几个步骤:
- Read Word document using POI into a format-agnostic form
- 使用POI将Word文档读入一个与表单无关的表单
- Convert format-agnostic form into PDF
- 将格式不可知的表单转换为PDF
- Write PDF
- 写PDF
I don't know if POI will do step 2 for you. I'd recommend something else, like iText.
我不知道POI是否会为你做第二步。我推荐别的东西,比如iText。
#4
1
As a side note, it's also possible to read content on-the-fly directly from a Word/Excel content stream instead of reading it from the filesystem and serializing it to disk, for example when retrieving content from CMIS repositories:
作为补充说明,也可以直接从Word/Excel内容流中读取内容,而不是从文件系统中读取内容并将其序列化到磁盘,例如从CMIS存储库中检索内容时:
e.g.
如。
//HWPFDocument docx = new HWPFDocument(fs);
HWPFDocument docx = new HWPFDocument(doc.getContentStream().getStream());
(doc is of type org.apache.chemistry.opencmis.client.api.Document
and in this case I adapted your code to retrieve a word file from an Alfresco repository by means of opencmis and transformed it to PDF)
(doc是type org.apache.chemistry.opencmis.client.api)。文档,在这个例子中,我调整了你的代码从Alfresco存储库中检索一个单词文件通过opencmis转换为PDF)
HTH
HTH
#5
1
The below code worked for me:
下面的代码对我有用:
Public class DocToPdfConverter{
public static void main(String[] args) {
String k=null;
OutputStream fileForPdf =null;
try {
String fileName="/document/test2.doc";
//Below Code is for .doc file
if(fileName.endsWith(".doc"))
{
HWPFDocument doc = new HWPFDocument(new FileInputStream(
fileName));
WordExtractor we=new WordExtractor(doc);
k = we.getText();
fileForPdf = new FileOutputStream(new File(
"/document/DocToPdf.pdf"));
we.close();
}
//Below Code for
else if(fileName.endsWith(".docx"))
{
XWPFDocument docx = new XWPFDocument(new FileInputStream(
fileName));
// using XWPFWordExtractor Class
XWPFWordExtractor we = new XWPFWordExtractor(docx);
k = we.getText();
fileForPdf = new FileOutputStream(new File(
"/document/DocxToPdf.pdf"));
we.close();
}
Document document = new Document();
PdfWriter.getInstance(document, fileForPdf);
document.open();
document.add(new Paragraph(k));
document.close();
fileForPdf.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
#6
0
In addition to Kushagra's answer, here the updated maven dependencies:
除了Kushagra的答案,这里更新的maven依赖性:
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
<version>2.0.1</version>
</dependency>