python如何将word中的公式截图啊，求助攻

如题，求教啊，本人不会python，现在的需求用得到，求大哥们指点一二

11 个解决方案

#1

顶一下11111111111111111

#2

你不会python就比较烦麻了！

python 能处理word文件的包有一些但是都没有直接出图片的功能的，因为好象这个需求比较少见。
但是从PDF直接导出图片的包比较多。

哥只能告诉你怎么做的可行方法和步骤，你自己去弄细节吧。

第一步先把word文件转存成PDF格式。
这个用pywin32之个包
先安装pywin32
https://pypi.python.org/pypi/pywin32/



from win32com import client as wc 
word = wc.Dispatch('Word.Application') 
doc = word.Documents.Open('/FilePath/test.docx') 
doc.SaveAs('/DestPath/test.pdf', 17) #17对应于下表中的pdf文件
doc.Close() 
word.Quit()

第二步可以用Wand把PDF转成图片:
(http://docs.wand-py.org/en/0.4.4/)


from wand.image import Image 
with Image(filename='filename.pdf') as pdf: 
    with pdf.convert('jpeg') as image: 
        image.save(filename='result.jpeg')

也可以PyMuPDF这个包来实现PDF导出图片
https://github.com/rk700/PyMuPDF


#! python
'''
This demo extracts all images of a PDF as PNG files that are referenced
by pages.
Runtime is determined by number of pages and volume of stored images.
Usage:
extract_img1.py input.pdf
'''
from __future__ import print_function
import fitz
import sys, time

assert len(sys.argv) == 2, 'Usage: %s <input file>' % sys.argv[0]
    
t0 = time.clock()
doc = fitz.open(sys.argv[1])
imgcount = 0
lenXREF = doc._getXrefLength()

# display some file info
print("file: %s, pages: %s, objects: %s" % (sys.argv[1], len(doc), lenXREF-1))

for i in range(len(doc)):
    imglist = doc.getPageImageList(i)
    for img in imglist:
        xref = img[0]                  # xref number
        pix = fitz.Pixmap(doc, xref)   # make pixmap from image
        imgcount += 1
        if pix.colorspace.n < 4:       # can be saved as PNG
            pix.writePNG("p%s-%s.png" % (i, xref))
        else:                          # must convert CMYK first
            pix0 = fitz.Pixmap(fitz.csRGB, pix)
            pix0.writePNG("p%s-%s.png" % (i, xref))
            pix0 = None                # free Pixmap resources
        pix = None                     # free Pixmap resources

t1 = time.clock()
print("run time", round(t1-t0, 2))
print("extracted images", imgcount)

#3

如果只是word中的公式是图片的
可以用python-docx
https://github.com/python-openxml/python-docx

#4

from win32com import client as wc 这里报错multiple statements found while compiling a single statement 怎么回事

#5

从提示的错误信息看，应该是缺少回车之类的问题。

python 一行一个语句，用缩进表示语块范围。
你先自学一下Python吧。

#6

引用 5 楼 xpresslink 的回复:

从提示的错误信息看，应该是缺少回车之类的问题。

python 一行一个语句，用缩进表示语块范围。
你先自学一下Python吧。

PyMuPDF这下载不了啊说是Uploads are disabled.
File uploads require push access to this repository.

#7

你不光python不行，英语也不行啊

Installation

If you had not previously installed MuPDF, you must first do this. This process highly depends on your system. For most platforms, the MuPDF source contains prepared procedures on how to achieve this. If you decide to generate MuPDF from sources, be sure to download the official release from https://mupdf.com/downloads/. MuPDF's GitHub repo contains the current development source which probably is incompatible with PyMuPDF.

https://mupdf.com/downloads
https://mupdf.com/downloads/mupdf-1.11-windows.zip

#8

楼上大哥说的基本可以，转pdf和jpeg的操作已经实现剩下的我就自己琢磨一下了

#9

引用 3 楼 xpresslink 的回复:

如果只是word中的公式是图片的
可以用python-docx
https://github.com/python-openxml/python-docx

那个我想请问下怎么获取word 中的数学公式，不是图片的那种

#10

或者老哥提供下思路也可以啊谢谢你了

#11

顶一下1111111111111

#1

顶一下11111111111111111

#2



from win32com import client as wc 
word = wc.Dispatch('Word.Application') 
doc = word.Documents.Open('/FilePath/test.docx') 
doc.SaveAs('/DestPath/test.pdf', 17) #17对应于下表中的pdf文件
doc.Close() 
word.Quit()

第二步可以用Wand把PDF转成图片:
(http://docs.wand-py.org/en/0.4.4/)


from wand.image import Image 
with Image(filename='filename.pdf') as pdf: 
    with pdf.convert('jpeg') as image: 
        image.save(filename='result.jpeg')

也可以PyMuPDF这个包来实现PDF导出图片
https://github.com/rk700/PyMuPDF


#! python
'''
This demo extracts all images of a PDF as PNG files that are referenced
by pages.
Runtime is determined by number of pages and volume of stored images.
Usage:
extract_img1.py input.pdf
'''
from __future__ import print_function
import fitz
import sys, time

assert len(sys.argv) == 2, 'Usage: %s <input file>' % sys.argv[0]
    
t0 = time.clock()
doc = fitz.open(sys.argv[1])
imgcount = 0
lenXREF = doc._getXrefLength()

# display some file info
print("file: %s, pages: %s, objects: %s" % (sys.argv[1], len(doc), lenXREF-1))

for i in range(len(doc)):
    imglist = doc.getPageImageList(i)
    for img in imglist:
        xref = img[0]                  # xref number
        pix = fitz.Pixmap(doc, xref)   # make pixmap from image
        imgcount += 1
        if pix.colorspace.n < 4:       # can be saved as PNG
            pix.writePNG("p%s-%s.png" % (i, xref))
        else:                          # must convert CMYK first
            pix0 = fitz.Pixmap(fitz.csRGB, pix)
            pix0.writePNG("p%s-%s.png" % (i, xref))
            pix0 = None                # free Pixmap resources
        pix = None                     # free Pixmap resources

t1 = time.clock()
print("run time", round(t1-t0, 2))
print("extracted images", imgcount)

#3

如果只是word中的公式是图片的
可以用python-docx
https://github.com/python-openxml/python-docx

#4

from win32com import client as wc 这里报错multiple statements found while compiling a single statement 怎么回事

#5

从提示的错误信息看，应该是缺少回车之类的问题。

python 一行一个语句，用缩进表示语块范围。
你先自学一下Python吧。

#6

引用 5 楼 xpresslink 的回复:

从提示的错误信息看，应该是缺少回车之类的问题。

python 一行一个语句，用缩进表示语块范围。
你先自学一下Python吧。

PyMuPDF这下载不了啊说是Uploads are disabled.
File uploads require push access to this repository.

#7

#8

楼上大哥说的基本可以，转pdf和jpeg的操作已经实现剩下的我就自己琢磨一下了

#9

引用 3 楼 xpresslink 的回复:

如果只是word中的公式是图片的
可以用python-docx
https://github.com/python-openxml/python-docx

那个我想请问下怎么获取word 中的数学公式，不是图片的那种

#10

或者老哥提供下思路也可以啊谢谢你了

#11

顶一下1111111111111

秒客网

python如何将word中的公式截图啊，求助攻

11 个解决方案

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

相关文章