Windows脚本重新排序和替换xml文件中的文本？

i have found several similar questions but none quite reach my goal, im tying to edit multiple lines in an xml file. My knowledge of scripts is at best very basic so please include some details my basic brain will understand

我发现了几个类似的问题,但没有完全达到我的目标,即在xml文件中编辑多行。我对脚本的了解至多是非常基础的,所以请包含我基本的大脑会理解的一些细节

im trying to convert this

即时通讯试图转换这个

    <?xml version="1.0" encoding="UTF-8"?>
  <channels>
    <channel update="i" site="openwebif" site_id="1:0:1:D32E:836:2:11A0000:0:0:0:" xmltv_id="&amp;TV">&amp;TV</channel>
    <channel update="i" site="openwebif" site_id="1:0:1:2F17:7EF:2:11A0000:0:0:0:" xmltv_id="4Music">4Music</channel>
    <channel update="i" site="openwebif" site_id="1:0:1:5302:814:2:11A0000:0:0:0:" xmltv_id="4seven">4seven</channel>

into this

    <?xml version="1.0" encoding="UTF-8"?>
  <channels>
<!-- vermin --><channel id="&amp;TV">1:0:1:D32E:836:2:11A0000:0:0:0:</channel><!-- VM -->
<!-- vermin --><channel id="4Music">1:0:1:2F17:7EF:2:11A0000:0:0:0:</channel><!-- VM -->
<!-- vermin --><channel id="4seven">1:0:1:5302:814:2:11A0000:0:0:0:</channel><!-- VM -->

im not even sure what would work best? Can this be done with python ? batch?

我甚至不确定哪种方法效果最好?这可以用python完成吗?批量?

TIA

1 个解决方案

#1

import re

# Open the xml file.
with open('test1.xml', encoding='utf-8') as r:

    # Read the file contents whole.
    content = r.read()

    # Do replacements using regex.
    content = re.sub(r'^\s*(<channel)\s+.*?\s+site_id="(.*?)"\s+xmltv_id="(.*?)">.*?(</channel>)',
                     r'<!-- vermin -->\1 id="\3">\2\4<!-- VM -->', content, 0, re.I + re.M)

    # Open and write the changed xml file.
    with open('test2.xml', 'w', encoding='utf-8') as w:
        w.write(content)

Python 3 is used since you mentioned Python in the summary of your question.

自从您在问题摘要中提到Python以来,就使用了Python 3。

This is using Regular Expressions to modify the XML. If the XML has a reasonable constant structure as with the example posted, then this may meet your goal.

这是使用正则表达式来修改XML。如果XML具有合理的常量结构,就像发布的示例一样,那么这可能符合您的目标。

test1.xml is read and the modifications are done using a Regular Expression pattern with re.sub().

读取test1.xml并使用带有re.sub()的正则表达式模式完成修改。

test2.xml is the XML file with the changes applied.

test2.xml是应用了更改的XML文件。

Both files are treated as utf-8.

这两个文件都被视为utf-8。

Read the Python help file about the re module.

阅读有关re模块的Python帮助文件。

Brief overview of Regular Expressions used.

使用正则表达式的简要概述。

^ match start of line.

^匹配线的开始。

\s match whitespace characters.

\ s匹配空白字符。

* match 0 or more of previous pattern|character.

*匹配前一个模式|字符的0或更多。

+ match 1 or more of previous pattern|character.

+匹配先前模式|字符的1个或多个。

(.*?) capture any character as a group being not greedy.

(。*?)将任何角色捕获为不贪婪的群体。

\1 is 1st group as replacement. \2 is 2nd group...

\ 1是第1组作为替换。 \ 2是第2组......

re.I is insensitive flag.

re.I是麻木不仁的旗帜。

re.M is multiline flag so line anchors ^ and $ can be used.

re.M是多行标志,因此可以使用行锚点^和$。

Suggest you read the Python help file as it is more comprehensive for learning.

建议您阅读Python帮助文件,因为它更全面地用于学习。

#1

import re

# Open the xml file.
with open('test1.xml', encoding='utf-8') as r:

    # Read the file contents whole.
    content = r.read()

    # Do replacements using regex.
    content = re.sub(r'^\s*(<channel)\s+.*?\s+site_id="(.*?)"\s+xmltv_id="(.*?)">.*?(</channel>)',
                     r'<!-- vermin -->\1 id="\3">\2\4<!-- VM -->', content, 0, re.I + re.M)

    # Open and write the changed xml file.
    with open('test2.xml', 'w', encoding='utf-8') as w:
        w.write(content)