使用perl搜索和替换文件夹中的多个xml文件

时间:2022-03-15 16:51:26

I am just entering to the perl world, I got a task to replace multiple xml file in a folder using perl, I tried some of the perl one line code but it does not helped me out, I need a perl code which replace multiple text files in a selected folder. I tried this below post from * Replace values for multiple XML files in a folder using perl but it also not helped me. Kindly be gentle because I am new, I furnish my tried code from above stackflow post showing error, please see and suggest solution.

我刚刚进入perl世界,我有一个任务是使用perl替换文件夹中的多个xml文件,我尝试了一些perl一行代码,但它没有帮助我,我需要一个替换多个文本的perl代码所选文件夹中的文件。我在下面的文章中尝试使用*使用perl替换文件夹中的多个XML文件的值,但它也没有帮助我。请保持温柔,因为我是新手,我从上面的stackflow帖子中提供我尝试过的代码显示错误,请查看并提出解决方案。

my $dir = ***D:\Perl***;
my $d = opendir();
map {
    if (
        -f "$dir/$_"
        && ($_ =~ "\.xml$")
    ) {
        open (my $input_file, '<', ) or die "unable to open $input_file $!\n";

        my $input;
        {
            local $/;               #Set record separator to undefined.
            $input = <$input_file>; #This allows the whole input file to be read at once.
        }
        close $input_file;

        $input =~ s/Comment//g;

        open (my $output_file, '>', "$dir/$_") or die "unable to open $output_file $!\n";
        print {$output_file} $input;

        close $output_file or die $!;
    }
} readdir($d);
closedir($d);

error

错误

syntax error at hello3.pl line 10, near "=~ "\.xml$""
Global symbol "$dir" requires explicit package name at hello3.pl line 23.
Global symbol "$output_file" requires explicit package name at hello3.pl line 23.
syntax error at hello3.pl line 28, near "}"
Global symbol "$d" requires explicit package name at hello3.pl line 28.
Global symbol "$d" requires explicit package name at hello3.pl line 29.
Execution of hello3.pl aborted due to compilation errors.

XML files are in the folder D:\Perl\

XML文件位于文件夹D:\ Perl \

1.xml
2.xml
3.xml

codes in each xml files are follow below

每个xml文件中的代码如下所示

<?xml version="1.0">
<root>
<!--This is my comment line 1-->
<subtag>
<element>This is 1.xml file</element>
</subtag>
</root>

1 个解决方案

#1


2  

I'm impressed as a newcomer to Perl, you've latched on to map. map is designed to turn an array into a hash - and it can do this by evaluating a code block.

作为Perl的新手,我印象深刻,你已经锁定了地图。 map旨在将数组转换为哈希 - 它可以通过评估代码块来实现。

However that's pretty nasty, because it creates code that's hard to follow. Why not instead use a for (or foreach) loop? The key warning sign is 'am I assigning the result of map to a hash (or hashref)?' If the answer is no, then chances are this isn't a good way to do it.

然而,这非常讨厌,因为它创建了难以遵循的代码。为什么不使用for(或foreach)循环?关键警告标志是'我是否将地图结果分配给哈希(或hashref)?'如果答案是否定的,那么这可能不是一个好方法。

Also: I tend to prefer glob over opendir for this style of iteration operation.

另外:我倾向于选择glob over opendir来进行这种迭代操作。

But most importantly of all:

但最重要的是:

Don't use regular expressions and line based parsing for XML

Please, please, please use an XML Parser to parse XML. Doing so via regular expressions is just nasty - it makes brittle unreliable code. There's a bunch of things in the XML spec that makes semantically identical XML (and therefore 'valid' from a perspective of an upstream system) not match your regular expressions. Things like unary tags, line wrapping and splitting tags across lines.

请,请使用XML Parser来解析XML。通过正则表达式这样做只是令人讨厌 - 它会使代码变得脆弱不可靠。 XML规范中有很多东西使得语义相同的XML(因此从上游系统的角度来说“有效”)与正则表达式不匹配。比如一元标签,换行和分割标签。

As an example:

举个例子:

<XML
><some_tag
att1="1"
att2="2"
att3="3"
></some_tag></XML>

Or:

要么:

<XML><some_tag att1="1" att2="2" att3="3"></some_tag></XML>

Or:

要么:

<XML>
  <some_tag
      att1="1"
      att2="2"
      att3="3"></some_tag>
</XML>

Or:

要么:

<XML>
  <some_tag att1="1" att2="2" att3="3"></some_tag>
</XML>

Or:

要么:

<XML>
  <some_tag att1="1" att2="2" att3="3"/>
</XML>

All 'say' basically the same thing (technically there's a minor difference between a 'no text' and a 'null text' in the last example), but as I hope you can clearly see - a line and regex based test to encompass all of them would be difficult. Which is why I keep on suggesting - "use a parser" every time this comes up.

所有'说'基本上都是一样的(技术上,在上一个例子中'无文本'和'空文本'之间存在细微差别),但我希望你能清楚地看到 - 基于线和正则表达式的测试包含所有他们会很难。这就是我继续建议的原因 - 每次出现时都会使用“解析器”。

With that in mind - you probably don't actually need to remove comments at all - because they're part of the XML spec, and it's far better to handle them as part of the parse process.

考虑到这一点 - 您可能实际上根本不需要删除注释 - 因为它们是XML规范的一部分,并且作为解析过程的一部分处理它们要好得多。

I like XML::Twig and perl for this. Other modules exist though, and it may be that you get on with others (like XML::LibXML) instead.

我喜欢XML :: Twig和perl。但是存在其他模块,可能是您继续使用其他模块(例如XML :: LibXML)。

Oh, and there's an error in your XML the line should be:

哦,你的XML中应该有一个错误:

<?xml version="1.0"?>

Anyway, with that in mind - to answer your question as asked:

无论如何,考虑到这一点 - 按要求回答你的问题:

Removing comments from some XML

#!/usr/local/bin/perl
use strict;
use warnings;

use XML::Twig;

foreach my $file ( glob("$dir/*.xml") ) {
    my $twig =
        XML::Twig->new( comments => 'drop', pretty_print => 'indented_a' );
    $twig->parsefile($file);
    open( my $output, ">", $file . ".new" ) or warn $!;
    print {$output} $twig->sprint;
    close($output);
}

This will turn your sample XML into:

这会将您的示例XML转换为:

<?xml version="1.0"?>
<root>
  <subtag>
    <element>This is 1.xml file</element>
  </subtag>
</root>

Deleting 'non comment' elements

If you were wanting to delete something other than comments - bearing in mind that comments are a special case - and instead wanted to say, get rid of a particular element:

如果你想删除评论以外的东西 - 记住评论是一个特例 - 而是想说,摆脱一个特定的元素:

XML::Twig->new( pretty_print => 'indented_a',
                twig_handlers => { 'element' => sub { $_ -> delete } } );

Note - this will delete every element tag - you can apply more selective criteria either via an xpath expression (e.g. 'subtag/element') or use a proper subroutine to handle and parse:

注意 - 这将删除每个元素标记 - 您可以通过xpath表达式(例如'subtag / element')应用更多选择性条件,或使用适当的子例程来处理和解析:

sub delete_element_with_file {
    my ( $twig, $element ) = @_;
    if ( $element->text =~ m/file/ ) { $element->delete }
}


my $twig = XML::Twig->new(
    pretty_print  => 'indented_a',
    twig_handlers => { 'subtag/element' => \&delete_element_with_file }
);

##etc. 

#1


2  

I'm impressed as a newcomer to Perl, you've latched on to map. map is designed to turn an array into a hash - and it can do this by evaluating a code block.

作为Perl的新手,我印象深刻,你已经锁定了地图。 map旨在将数组转换为哈希 - 它可以通过评估代码块来实现。

However that's pretty nasty, because it creates code that's hard to follow. Why not instead use a for (or foreach) loop? The key warning sign is 'am I assigning the result of map to a hash (or hashref)?' If the answer is no, then chances are this isn't a good way to do it.

然而,这非常讨厌,因为它创建了难以遵循的代码。为什么不使用for(或foreach)循环?关键警告标志是'我是否将地图结果分配给哈希(或hashref)?'如果答案是否定的,那么这可能不是一个好方法。

Also: I tend to prefer glob over opendir for this style of iteration operation.

另外:我倾向于选择glob over opendir来进行这种迭代操作。

But most importantly of all:

但最重要的是:

Don't use regular expressions and line based parsing for XML

Please, please, please use an XML Parser to parse XML. Doing so via regular expressions is just nasty - it makes brittle unreliable code. There's a bunch of things in the XML spec that makes semantically identical XML (and therefore 'valid' from a perspective of an upstream system) not match your regular expressions. Things like unary tags, line wrapping and splitting tags across lines.

请,请使用XML Parser来解析XML。通过正则表达式这样做只是令人讨厌 - 它会使代码变得脆弱不可靠。 XML规范中有很多东西使得语义相同的XML(因此从上游系统的角度来说“有效”)与正则表达式不匹配。比如一元标签,换行和分割标签。

As an example:

举个例子:

<XML
><some_tag
att1="1"
att2="2"
att3="3"
></some_tag></XML>

Or:

要么:

<XML><some_tag att1="1" att2="2" att3="3"></some_tag></XML>

Or:

要么:

<XML>
  <some_tag
      att1="1"
      att2="2"
      att3="3"></some_tag>
</XML>

Or:

要么:

<XML>
  <some_tag att1="1" att2="2" att3="3"></some_tag>
</XML>

Or:

要么:

<XML>
  <some_tag att1="1" att2="2" att3="3"/>
</XML>

All 'say' basically the same thing (technically there's a minor difference between a 'no text' and a 'null text' in the last example), but as I hope you can clearly see - a line and regex based test to encompass all of them would be difficult. Which is why I keep on suggesting - "use a parser" every time this comes up.

所有'说'基本上都是一样的(技术上,在上一个例子中'无文本'和'空文本'之间存在细微差别),但我希望你能清楚地看到 - 基于线和正则表达式的测试包含所有他们会很难。这就是我继续建议的原因 - 每次出现时都会使用“解析器”。

With that in mind - you probably don't actually need to remove comments at all - because they're part of the XML spec, and it's far better to handle them as part of the parse process.

考虑到这一点 - 您可能实际上根本不需要删除注释 - 因为它们是XML规范的一部分,并且作为解析过程的一部分处理它们要好得多。

I like XML::Twig and perl for this. Other modules exist though, and it may be that you get on with others (like XML::LibXML) instead.

我喜欢XML :: Twig和perl。但是存在其他模块,可能是您继续使用其他模块(例如XML :: LibXML)。

Oh, and there's an error in your XML the line should be:

哦,你的XML中应该有一个错误:

<?xml version="1.0"?>

Anyway, with that in mind - to answer your question as asked:

无论如何,考虑到这一点 - 按要求回答你的问题:

Removing comments from some XML

#!/usr/local/bin/perl
use strict;
use warnings;

use XML::Twig;

foreach my $file ( glob("$dir/*.xml") ) {
    my $twig =
        XML::Twig->new( comments => 'drop', pretty_print => 'indented_a' );
    $twig->parsefile($file);
    open( my $output, ">", $file . ".new" ) or warn $!;
    print {$output} $twig->sprint;
    close($output);
}

This will turn your sample XML into:

这会将您的示例XML转换为:

<?xml version="1.0"?>
<root>
  <subtag>
    <element>This is 1.xml file</element>
  </subtag>
</root>

Deleting 'non comment' elements

If you were wanting to delete something other than comments - bearing in mind that comments are a special case - and instead wanted to say, get rid of a particular element:

如果你想删除评论以外的东西 - 记住评论是一个特例 - 而是想说,摆脱一个特定的元素:

XML::Twig->new( pretty_print => 'indented_a',
                twig_handlers => { 'element' => sub { $_ -> delete } } );

Note - this will delete every element tag - you can apply more selective criteria either via an xpath expression (e.g. 'subtag/element') or use a proper subroutine to handle and parse:

注意 - 这将删除每个元素标记 - 您可以通过xpath表达式(例如'subtag / element')应用更多选择性条件,或使用适当的子例程来处理和解析:

sub delete_element_with_file {
    my ( $twig, $element ) = @_;
    if ( $element->text =~ m/file/ ) { $element->delete }
}


my $twig = XML::Twig->new(
    pretty_print  => 'indented_a',
    twig_handlers => { 'subtag/element' => \&delete_element_with_file }
);

##etc.