I want to read a .docx file paragraph by paragraph and I want to check font-family, font-size, margin, alignment, color and etc. for each paragraph. This is an example of my .docx file:
我想逐段阅读.docx文件,我想检查每个段落的字体系列,字体大小,边距,对齐方式,颜色等。这是我的.docx文件的一个例子:
And this is my code:
这是我的代码:
FileInputStream fis = new FileInputStream("D:/test3.docx");
XWPFDocument docx = new XWPFDocument(fis);
List<XWPFParagraph> paragraphList = docx.getParagraphs();
for (int i = 0; i < paragraphList.size(); i++) {
System.out.println("paragraph " + i + " is:: " + paragraphList.get(i).getText());
for (XWPFRun run : paragraphList.get(i).getRuns()) {
System.out.println("paragraph :: run text is:: " + run.text());
System.out.println("paragraph :: run color is:: " + run.getColor());
System.out.println("paragraph :: run font-famyly is:: " + run.getFontFamily()); //It always return null; why?
System.out.println("paragraph :: run font-name is:: " + run.getFontName()); //It always return null; why?
System.out.println("paragraph :: run text position is:: " + run.getTextPosition()); //It always return -1; why?
System.out.println("paragraph :: run font-size is:: " + run.getFontSize());
System.out.println("paragraph :: run IsBold:: " + run.isBold());
System.out.println("paragraph :: run IsItalic:: " + run.isItalic());
}}
But fontFamily(for each font-family that I choose), fontName, textPosition are always null. I have another code sample to do this :
但是fontFamily(对于我选择的每个字体系列),fontName,textPosition总是为null。我有另一个代码示例来执行此操作:
XWPFStyles styles = docx.getStyles();
for (int i = 0; i < paragraphList.size(); i++) {
System.out.println("paragraph " + i + " styleID is:: " + paragraphList.get(i).getStyleID());
if (paragraphList.get(i).getStyleID() != null) {
String styleid = paragraphList.get(i).getStyleID();
XWPFStyle style = styles.getStyle(styleid);
if (style != null) {
System.out.println("style name is:: " + style.getName());
if (style.getName().startsWith("heading")) {
System.out.println("This part of text is heading!!");
}
}
}
}
but style is usually null except for headings.
但除标题外,样式通常为null。
3 个解决方案
#1
1
Apache POI parses the document.xml
part of a .docx
file. When you do a run.getFontFamily()
, it will return the font family only if it is present in the run properties of the run. Otherwise it will return null. For example, consider this sample run
Apache POI解析.docx文件的document.xml部分。当您执行run.getFontFamily()时,只有在运行的运行属性中存在字体系列时,它才会返回该字体系列。否则它将返回null。例如,请考虑此示例运行
<w:r>
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:t>The quick brown fox jumps over the lazy dog.</w:t>
</w:r>
This does have its font family specified in the <w:rPr>
run properties tag. In cases like this, you have to go up the hierarchy and see if the paragraph which has this run has a style. If even the <w:Pr>
paragraph properties does not have a style, then the font family which is default to the document is applied. The document defaults are specified in the styles.xml
file.
这确实在
#2
1
Here is a sample code to get the styles from the style.xml using apache POI.
下面是使用apache POI从style.xml获取样式的示例代码。
XWPFDocument docx; // Set the docx
XWPFRun run; //get the required run
String fontFamily= run.getFontFamily();
if(fontFamily == null){ // When the font in the run is null check for the default fonts in styles.xml
String styleID = run.getParagraph().getStyleID();
XWPFStyle style = docx.getStyle(styleID);
CTStyle ctStyle = style.getCTStyle();
CTRPr ctrPr = ctStyle.getRPr();
CTFonts ctFonts = ctrPr.getRFonts();
if(ctFonts!= null){
fontFamily = ctFonts.getAscii(); // Or you may getCs() , getAnsi() etc.
}
// else {
// fontFamily = ctStyle.getPPr().getRPr().getRFonts().getAscii();
// }
// System.out.println();
}
return fontFamily;
Hope this would be helpfull.
希望这会有所帮助。
#3
0
I get out that fonts which I wanted to check were not standard!! So, for another standard fonts, all of above code works perfect.
我知道我想检查的字体不是标准的!因此,对于另一种标准字体,以上所有代码都是完美的。
#1
1
Apache POI parses the document.xml
part of a .docx
file. When you do a run.getFontFamily()
, it will return the font family only if it is present in the run properties of the run. Otherwise it will return null. For example, consider this sample run
Apache POI解析.docx文件的document.xml部分。当您执行run.getFontFamily()时,只有在运行的运行属性中存在字体系列时,它才会返回该字体系列。否则它将返回null。例如,请考虑此示例运行
<w:r>
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:t>The quick brown fox jumps over the lazy dog.</w:t>
</w:r>
This does have its font family specified in the <w:rPr>
run properties tag. In cases like this, you have to go up the hierarchy and see if the paragraph which has this run has a style. If even the <w:Pr>
paragraph properties does not have a style, then the font family which is default to the document is applied. The document defaults are specified in the styles.xml
file.
这确实在
#2
1
Here is a sample code to get the styles from the style.xml using apache POI.
下面是使用apache POI从style.xml获取样式的示例代码。
XWPFDocument docx; // Set the docx
XWPFRun run; //get the required run
String fontFamily= run.getFontFamily();
if(fontFamily == null){ // When the font in the run is null check for the default fonts in styles.xml
String styleID = run.getParagraph().getStyleID();
XWPFStyle style = docx.getStyle(styleID);
CTStyle ctStyle = style.getCTStyle();
CTRPr ctrPr = ctStyle.getRPr();
CTFonts ctFonts = ctrPr.getRFonts();
if(ctFonts!= null){
fontFamily = ctFonts.getAscii(); // Or you may getCs() , getAnsi() etc.
}
// else {
// fontFamily = ctStyle.getPPr().getRPr().getRFonts().getAscii();
// }
// System.out.println();
}
return fontFamily;
Hope this would be helpfull.
希望这会有所帮助。
#3
0
I get out that fonts which I wanted to check were not standard!! So, for another standard fonts, all of above code works perfect.
我知道我想检查的字体不是标准的!因此,对于另一种标准字体,以上所有代码都是完美的。