这几天一直在找从xml+xsl转换成PDF文档的方案,最好是用python实现的,找了好多国外的网站,最终还是在csdn上找到了,看来还是自家兄弟靠谱啊,在此谢谢wyuan8913了:-):
转换过程:
http://blog.csdn.net/wyuan8913/article/details/7185288
http://blog.csdn.net/wyuan8913/article/details/7185655
详情如下:
python将xml,xsl文件转成html文件存储
分类: python2012-01-08 16:43 8人阅读 评论(0) 收藏 举报前提:安装libxml2 libxstl
官方网站:http://xmlsoft.org/XSLT/index.html
安装包下载:http://xmlsoft.org/sources/
下面是windows平台的exe安装文件下载:
http://xmlsoft.org/sources/win32/python/
这是转载的测试代码:
- # -*- coding: mbcs -*-
- #!/usr/bin/python
- import libxml2, libxslt
- class compoundXML:
- def __init__(self):
- self._result = None
- self._xsl = None
- self._xml = None
- def do(self, xml_file_name, xsl_file_name):
- self._xml = libxml2.parseFile(xml_file_name)
- if self._xml == None:
- return 0
- styledoc = libxml2.parseFile(xsl_file_name)
- if styledoc == None:
- return 0
- self._xsl = libxslt.parseStylesheetDoc(styledoc)
- if self._xsl == None:
- return 0
- self._result = self._xsl.applyStylesheet(self._xml, None)
- def get_xml_doc(self):
- return self._result
- def get_translated(self):
- return self._result.serialize('UTF-8')
- def save_translated(self, file_name):
- self._xsl.saveResultToFilename(file_name, self._result, 0)
- def release(self):
- '''''
- this function must be called in the end.
- '''
- self._xsl.freeStylesheet()
- self._xml.freeDoc()
- self._result.freeDoc()
- self._xsl = None
- self._xml = None
- self._result = None
- if __name__ == '__main__':
- test = compoundXML()
- test.do('test/testxmlutil.xml', 'test/testxmlutil.xsl')
- print test.get_translated()
- test.save_translated('test/testxmlutil.htm')
- test.release()
这里有个问题是生成的html文件里的汉字从浏览器看都是乱码,首行加入
<metacharset="utf8"/>
即可正常显示。
python将html转成PDF,包含中文
分类: python2012-01-08 20:18 82人阅读 评论(0) 收藏 举报前提:
安装xhtml2pdf
下载字体:code2000.ttf;给个地址:http://ishare.iask.sina.com.cn/f/22120225.html
待转换的文件:1.htm
- <meta charset="utf8"/>
- <style type='text/css'>
- @font-face {
- font-family: "code2000";
- src: url("code2000.ttf")
- }
- html {
- font-family: code2000;
- }
- </style>
- <html>
- <body><table>
- <tr>
- <td>文字</td>
- <td>123</td>
- </tr>
- <tr>
- <td>图片</td>
- <td><img src="1.jpg"></td>
- </tr>
- </table></body></html>
html_to_pdf.py程序
- # -*- coding: utf-8 -*-
- import sx.pisa3 as pisa
- data= open('1.htm').read()
- result = file('test.pdf', 'wb')
- pdf = pisa.CreatePDF(data, result)
- result.close()
- pisa.startViewer('test.pdf')
说明:xhtml2pdf不能识别汉字,需要在html文件中通过CSS的方式嵌入code2000字体,貌似只能用code2000,原因不明。