POI按行读取word,并去掉属性标签内容:超链接

时间:2023-03-08 17:27:30
public String readDoc(File file) {
StringBuffer buffer = new StringBuffer();
InputStream input = null;
WordExtractor extractor = null;
String[] paragraphs = null;
try {
input = new FileInputStream(file);
extractor = new WordExtractor(input);
paragraphs = extractor.getParagraphText();
for (String paragraph : paragraphs) {
buffer.append(extractor.stripFields(paragraph)).append("\\\r\\\n");
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (input != null) {
try {
input.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return buffer.toString();
}

剔除方法:extractor.stripFields(paragraph);

提取文档内容文章。excel,pdf,word.....

http://blog.sina.com.cn/s/blog_67b9ad8d01010bwa.html

出现问题文章:

http://bbs.csdn.net/topics/320055955