
时间:2022-03-15 16:51:38

i have found several similar questions but none quite reach my goal, im tying to edit multiple lines in an xml file. My knowledge of scripts is at best very basic so please include some details my basic brain will understand


im trying to convert this


    <?xml version="1.0" encoding="UTF-8"?>
    <channel update="i" site="openwebif" site_id="1:0:1:D32E:836:2:11A0000:0:0:0:" xmltv_id="&amp;TV">&amp;TV</channel>
    <channel update="i" site="openwebif" site_id="1:0:1:2F17:7EF:2:11A0000:0:0:0:" xmltv_id="4Music">4Music</channel>
    <channel update="i" site="openwebif" site_id="1:0:1:5302:814:2:11A0000:0:0:0:" xmltv_id="4seven">4seven</channel>

into this

    <?xml version="1.0" encoding="UTF-8"?>
<!-- vermin --><channel id="&amp;TV">1:0:1:D32E:836:2:11A0000:0:0:0:</channel><!-- VM -->
<!-- vermin --><channel id="4Music">1:0:1:2F17:7EF:2:11A0000:0:0:0:</channel><!-- VM -->
<!-- vermin --><channel id="4seven">1:0:1:5302:814:2:11A0000:0:0:0:</channel><!-- VM -->

im not even sure what would work best? Can this be done with python ? batch?



1 个解决方案



import re

# Open the xml file.
with open('test1.xml', encoding='utf-8') as r:

    # Read the file contents whole.
    content =

    # Do replacements using regex.
    content = re.sub(r'^\s*(<channel)\s+.*?\s+site_id="(.*?)"\s+xmltv_id="(.*?)">.*?(</channel>)',
                     r'<!-- vermin -->\1 id="\3">\2\4<!-- VM -->', content, 0, re.I + re.M)

    # Open and write the changed xml file.
    with open('test2.xml', 'w', encoding='utf-8') as w:

Python 3 is used since you mentioned Python in the summary of your question.

自从您在问题摘要中提到Python以来,就使用了Python 3。

This is using Regular Expressions to modify the XML. If the XML has a reasonable constant structure as with the example posted, then this may meet your goal.


test1.xml is read and the modifications are done using a Regular Expression pattern with re.sub().


test2.xml is the XML file with the changes applied.


Both files are treated as utf-8.


Read the Python help file about the re module.


Brief overview of Regular Expressions used.


  • ^ match start of line.
  • ^匹配线的开始。

  • \s match whitespace characters.
  • \ s匹配空白字符。

  • * match 0 or more of previous pattern|character.
  • *匹配前一个模式|字符的0或更多。

  • + match 1 or more of previous pattern|character.
  • +匹配先前模式|字符的1个或多个。

  • (.*?) capture any character as a group being not greedy.
  • (。*?)将任何角色捕获为不贪婪的群体。

  • \1 is 1st group as replacement. \2 is 2nd group...
  • \ 1是第1组作为替换。 \ 2是第2组......

  • re.I is insensitive flag.
  • re.I是麻木不仁的旗帜。

  • re.M is multiline flag so line anchors ^ and $ can be used.
  • re.M是多行标志,因此可以使用行锚点^和$。

Suggest you read the Python help file as it is more comprehensive for learning.




import re

# Open the xml file.
with open('test1.xml', encoding='utf-8') as r:

    # Read the file contents whole.
    content =

    # Do replacements using regex.
    content = re.sub(r'^\s*(<channel)\s+.*?\s+site_id="(.*?)"\s+xmltv_id="(.*?)">.*?(</channel>)',
                     r'<!-- vermin -->\1 id="\3">\2\4<!-- VM -->', content, 0, re.I + re.M)

    # Open and write the changed xml file.
    with open('test2.xml', 'w', encoding='utf-8') as w:

Python 3 is used since you mentioned Python in the summary of your question.

自从您在问题摘要中提到Python以来,就使用了Python 3。

This is using Regular Expressions to modify the XML. If the XML has a reasonable constant structure as with the example posted, then this may meet your goal.


test1.xml is read and the modifications are done using a Regular Expression pattern with re.sub().


test2.xml is the XML file with the changes applied.


Both files are treated as utf-8.


Read the Python help file about the re module.


Brief overview of Regular Expressions used.


  • ^ match start of line.
  • ^匹配线的开始。

  • \s match whitespace characters.
  • \ s匹配空白字符。

  • * match 0 or more of previous pattern|character.
  • *匹配前一个模式|字符的0或更多。

  • + match 1 or more of previous pattern|character.
  • +匹配先前模式|字符的1个或多个。

  • (.*?) capture any character as a group being not greedy.
  • (。*?)将任何角色捕获为不贪婪的群体。

  • \1 is 1st group as replacement. \2 is 2nd group...
  • \ 1是第1组作为替换。 \ 2是第2组......

  • re.I is insensitive flag.
  • re.I是麻木不仁的旗帜。

  • re.M is multiline flag so line anchors ^ and $ can be used.
  • re.M是多行标志,因此可以使用行锚点^和$。

Suggest you read the Python help file as it is more comprehensive for learning.
