I am using python 3.4 along with python-docx
library to work on .docx
files. I have been able to extract text from the document. But my objective is to extract only those text with certain font (and modify them).
我使用python 3.4和python-docx库来处理.docx文件。我已经能够从文档中提取文本。但我的目标是仅提取具有特定字体的文本(并修改它们)。
I have been searching for this in the library documentation for the past two days with no result.
过去两天我一直在图书馆文档中搜索这个,没有结果。
Does anybody here have experience with this library, if so could they point me in the right direction.
这里有没有人有这个图书馆的经验,如果是这样,他们可以指出我正确的方向。
1 个解决方案
#1
2
At present, python-docx
only has the ability to apply a font typeface using a style. You can detect runs having a particular style like this:
目前,python-docx只能使用样式应用字体字体。您可以检测具有如下特定样式的运行:
document = Document('having-fonts.docx')
for paragraph in document.paragraphs:
for run in paragraph.runs:
if run.style == style_I_want:
print run.text
If the special fonts are applied using a paragraph style you could use this:
如果使用段落样式应用特殊字体,您可以使用:
document = Document('having-fonts.docx')
for paragraph in document.paragraphs:
if paragraph.style == style_I_want:
print paragraph.text
If you can say more about the particulars I may be able to be more specific.
如果您可以更详细地说明细节,我可能会更具体。
#1
2
At present, python-docx
only has the ability to apply a font typeface using a style. You can detect runs having a particular style like this:
目前,python-docx只能使用样式应用字体字体。您可以检测具有如下特定样式的运行:
document = Document('having-fonts.docx')
for paragraph in document.paragraphs:
for run in paragraph.runs:
if run.style == style_I_want:
print run.text
If the special fonts are applied using a paragraph style you could use this:
如果使用段落样式应用特殊字体,您可以使用:
document = Document('having-fonts.docx')
for paragraph in document.paragraphs:
if paragraph.style == style_I_want:
print paragraph.text
If you can say more about the particulars I may be able to be more specific.
如果您可以更详细地说明细节,我可能会更具体。