读取xml文件,写入excel

时间:2021-07-07 14:05:23

  在上一篇 Python写xml文件已经将所有订单写入xml文件,这一篇我们把xml文件中的内容读出来,写入excel文件。

  输入xml格式:

 <?xml version="1.0" encoding="utf-8"?>
<orderlist>
<order>
<customer>姓名1</customer>
<phone>123456</phone>
<address>成都</address>
<count>2</count>
</order>
<order>
<customer>姓名2</customer>
<phone>234567</phone>
<address>成都</address>
<count>5</count>
</order>
<order>
<customer>姓名3</customer>
<phone>345678</phone>
<address>成都</address>
<count>1</count>
</order>
</orderlist>

  输出excel格式(这里按点餐次数进行的降序排序):

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAT4AAABfCAIAAAD+hTwsAAAKIklEQVR4nO1du67kNgz1V92v2yJVuv2YYJEmQJo0aRdpFjDuYottUuYPlGIQxys+RMkUadE8UOGRNRoeiseUX5rt2/d/smTJslzZfk4kEgti++XX36aWb9//jleS11olJK8tXZa8kteKJaWbvJLXkiWlm7yS15IlpZu8kteSRVm6f31593LZ29ub2W8Z8HoDiMELZReAl/1gaUr3ry/vv//xp7HLzr4z+y0bXhWjGCH+EF4GpNSk+9Ktl3RfnjJTb4a4FqkwvCCR2bx0pHvoNqWrTgpuL83LcnJkyQtSW0C6Z92mdHVJWZ472fBK6WqVq9KtdJvSVScFt5fmldLVKpekC3XrIt2oV2Ifcq5rUEKO17h0Ud16SdfMX168bKjlZSpFajeVLqXblO4kXjbUcrwW4rX801TneXLgW/yRQvyb4UiZ8Vr7kQwXl7mU5LVWCckrpZu8kteSZfqr9lmyZJlRpmfdEhH7+1dvE6YgeS2ElO4IQoZCSV5LwUG627Zt2yapHGhjg2YoNE29D5czeF5dNm//4ePHj9uPuP4TsBn/RZRXbyfDcTsJKd0RXJcu1UYY4pOgIt1t2z58+FA13rbt8+fPTbUIxcNsyHmldC9JV+j68mNMV5jhFx660nWnc0Ai3ab/qQYUO0mb4c4ZXjD24AZlJ19pgw7pDtxorqRbOf18bOZdcDevwVCgwhoGGQNHRi9cz7rHyBZMk81x5F3H7OVd3SVdiZ1dPpkEkXSHnxGBWffspkO61FAd7T99+oTW2/gIgpKu8CMFX1JF71z31fKYIZ+/yHQi1AbTJ9V5xYuKOj4IDxxn710+UYdp1i2EdIvMBXfw1wsXpSsMHXuOTHaSWFjVb8T5LQWqQ96Y0i9dnhfVz7myaukyXp7SRX3RMNfDRxAqE2ZY6U5NeDmHrz/2vnQrVK8kGKC0XhuvSdk5GXbxYjaaFjoOmcO57rH9OiOCu3BD7yHaF1QmzFGlu7GQ9MP7gXJmStdhwsyPLh8KLo7Tle6deZUe6b5q4IWJAoZb0g/1i6iXtpTubOkWIn8yzkJDgfmWAXQnzPLYnY2L57r8V86TrKol9V2m8wK0pCXd5nhRNhjDQbpFaY7U/NY89GZdFEyg3IfXGFBGKtI975L7nJGu/HLx8dMPvcJciLFZXboVJGpcQrowRs+2NQV2BvqIFUOccUKv06ibQ8wXYQP0K14jVYTSHV4BoHmFGdbjVrKY5BoGitItNDt1s5vgZxPo2AnjfuwpSKbz6jSK99vA/WpJQDabTYXpg5DoUEmilnLQfUIcQsLr3JKvsQGaneBHqJmqH8h3wAmwHu22OsVF+xEeaiXjRbW0H7J86W8EIV8iK8lrKUxfJWN//5olSxb1kll3BPv71+2n7Vy8LdJBYF7eJugjpTuCKsS9zVFDYF7eJujDWrrUqX/DSqeLeBTOIe5tiyYC82q2Oa7AMXtvhZTuCI4Q9zZEGYF5Nducr1FL9npdWP7fJEfp8s46N3O/EF/hFeKOBkxCYF7NNky+PYccqmElM/sgle71RzKg8CTkhVnaWNI78XRO9eu8VUxjpith++u8bnWsvAhmvChUPVQRW1aR7vD/VsFHMs4bvMuOjzCIaw6tBuqoQhz9dUWz0T6p7StgeC0NKF3Yhueb0q2lW9WTVsrOQ4S9Xcd16cJDO2W2pD6ly6NLujB/oAkGzSuWGDnXVZFub+Sh0xiqZY8HRsCcO8kPLlRlr3MU+ULpQntWBD9hPiqrjeoraM1K0r1yrotuw4+1iT2hY+NKSrroAMvTMjMfYyQ0T7ozErsLmKwrka48S1uiT7q97/31nuvWxrFAyFg5EZUu8+t8rPBf4SWkS7l3NrEK5NKVBCH6RXu4LXBTVVLNKL8ID42TAEO8+euVFPljUKEjI6U7gGHpwsr1pDv2R93Uq/bNmuauqRHcxMAVyyZr6qOXdH09rAuJdAvr81WzrtYV5sIe23D7+q+vGrjS4OZQSlcRqHTRM5FmfMLZ0zSrGxhZJUPr5hCsR+yjwbRRdREC4aMLjEld7fl+1FiJHzVZDs1ZEhyFai90hbtP7B6E5N3RG5q+jgv5Jkp5Bi9JjMFYhdsPkm4kPCHEIyEkr1wlI0uWJUtm3RHscVeTiMrL2wR9pHRHsMddTSIqL28T9JGv2o9gj7uaRFRezTZb5yNA7kjpjmCPu5pEVF7NNvylY7jX/c5ZrpIxgj3uahJReTXbMPn2HHKohpXM7EOQVTL4TtQBQ4EyCT3Q8AcgpjHT1SCTH7E/5pGMJqoeqogtq0j3/qtkGIfXjj1NBa1CP0ro8A2afQ5jf9KDkBV4gindWrpVPWnlaH6eBEnWHZMumldRG2bwTekWoE8qwaB5xRJ2r9qXH72DHsA4Q1t59W7SlTRA5UFVDjitF8w5YTDpwokenx6oQ/Ma0u090YXnuug2/FibKJsM31m66MDL0zIzT4OSvgJKukvrtuQqGbpZt7QuwKAN+MbTvFSjS7rCXcJQODdTT8KodFfXbclVMq6f6xYihTJJifLLKtJtWlVJkTk2wfZdPyTBwInAEhiWLqxcTLqKr9qXTtUtLV1JZZMI9dFAujF0W3KVDK2sW3rCbl3pDkvuJtKd0b8XUOmiZxzN+ISzpGlWN2D3SEYZCjs4jYQuk8wzdYHe1+WtonZVPXe1V2fN8HKM0etoziagt6u90A/uDslVMkYQ8k2U8gxekhiDsQq3HyTdSHhCiEdCSF65SkaWLEuWzLoj2OOuJhGVl7cJ+kjpjmCPu5pEVF7eJugjX7UfwR53NYmovJptts5HgNyR0h3BHnc1iai8mm34S8dwr/tts1wlYwR73NUkovJqtmHy7TnkUA0rmdkHuz/phMKTkBdmaWNJ7/RLZLAxvEnIHINgJdNenTUMcd9DpBaY8aJQ9VBFbHmUdAtgy7vs+AiDu+Zg/sje3vPAoNw8NBok23y3cuzY01TXu3UHlC5swzNdW7oX39ctmHSretJK2XmIsLfrkEsXRj9lnqTeUrqK3bqjS7rUvKZKMGhesUTf/+sqSpeJQtxQWQawcSV17oTKiarsdULzGNG2u4XnSBdO9LrSQ1lFuodib75Khpkf0RCnSPF5GM5EKCJo9DDtB/Ac6cJtRrryLG0JO+kWOkyp+IMNmsFq40dJ1hUO7blZ79FNPXpSupIgRL9oD9O/xi5EbqT8he6iviLZpQXmZkMlRfmxhtdq02kqrFO6qFf5I/KtpQvT77B0i3hCwu/qTVC6ULzCnNI1gES6hR2LVbOuonSZYxtuX0p3qNsxPEq61VToHJmMb+GsaprVDXTfHLoyYRaGabWLAtVMzzkk5I9koLsGWKC81Fmj93V9A1QFzZtDcHSqvdAV7j7JVTJGEPJNlPIMXpIYg7EKtx8k3Uh4QohHQkheuUpGlixLlsy6I9gjHsVL8loKKd0RhAyFkryWQkp3BCFDoSSvpfAvtayFWBxAQS8AAAAASUVORK5CYII=" alt="" />

  工具库选择:beautifulsoup+lxml用于处理xml文件(当然也可直接使用lxml), xlsxwriter用于写excel文件。

  思路也比较简单:找出xml文件中的每一个<order>,然后进行降序排序,再将每一个order信息写入excel文件,每个order一行。

  代码如下:

 # coding: utf-8

 import bs4
import xlsxwriter # 读取xml文件,写入excel
def xmlToExcel(file_xml, file_excel):
# 打开xml文件,并以此创建一个bs对象
xml = open(file_xml, 'r')
doc = bs4.BeautifulSoup(xml, 'xml') # 创建一个excel文件,并添加一个sheet,命名为orders
workbook = xlsxwriter.Workbook(file_excel)
sheet = workbook.add_worksheet('orders') # 设置粗体
bold = workbook.add_format({'bold': True}) # 先在第一行写标题,用粗体
sheet.write('A1', u'姓名', bold)
sheet.write('B1', u'电话', bold)
sheet.write('C1', u'点餐次数', bold)
sheet.write('D1', u'地址', bold) # 筛选出所有的<order>,这里使用的是CSS选择器
order = doc.select('order') # 以每个order的count元素,对order进行降序排序
sort_key = lambda a: int(a.count.text)
order.sort(key=sort_key, reverse=True) # 行号,具体订单信息从第二行开始
row = 2
# 将每一个订单写入excel
for x in order:
# 提取出具体信息
name = x.customer.text
phone = x.phone.text
cnt = x.count.text
addr = x.address.text # 将具体信息写入excel
sheet.write('A%d' % row, name)
sheet.write('B%d' % row, phone)
sheet.write('C%d' % row, cnt)
sheet.write('D%d' % row, addr) row += 1 # 关闭文件
xml.close()
workbook.close() # 测试代码
if __name__ == '__main__':
file1 = 'hh.xml'
file2 = 'hehe.xlsx' xmlToExcel(file1, file2)