如何使用awk或Perl递增大型XML文件中的数字?

时间:2022-10-04 09:21:58

I have an XML file with the following line:

我有一个XML文件,其中包含以下行:

            <VALUE DECIMAL_VALUE="0.2725" UNIT_TYPE="percent"/>

I would like to increment this value by .04 and keep the format of the XML in place. I know this is possible with a Perl or awk script, but I am having difficulty with the expressions to isolate the number.

我想将此值增加.04并保持XML的格式。我知道这可以使用Perl或awk脚本,但是我很难用表达式来隔离数字。

5 个解决方案

#1


If you're on a box with the xsltproc command in place I would suggest you use XSLT for this.

如果你在xsltproc命令的盒子上,我建议你使用XSLT。

For a Perl solution I'd go for using the DOM. Check this DOM Processing with Perl article out.

对于Perl解决方案,我会选择使用DOM。使用Perl文章检查此DOM处理。

That said. If your XML file is produced in a predictable way something naïve like the following could work:

那就是说。如果您的XML文件以可预测的方式生成,那么天真如下的内容可能会起作用:

perl -pe 's#(<VALUE DECIMAL_VALUE=")([0-9.]+)(" UNIT_TYPE="percent"/>)#"$1" . ($2 + 0.4) . "$3"#e;'

#2


If you are absolutely sure that the format of your XML will never change, that the order of the attributes is fixed, that you can indeed get the regexp for the number right... then go for the non-parser based solution.

如果您完全确定XML的格式永远不会改变,那么属性的顺序是固定的,那么您确实可以获得正确数字的正则表达式...然后选择基于非解析器的解决方案。

Personally I would use XML::Twig (maybe because I wrote it ;--). It will process the XML as XML, while still respecting the original format of the file, and won't load it all in memory before starting to work.

我个人会使用XML :: Twig(也许是因为我写了它; - )。它将XML作为XML处理,同时仍然尊重文件的原始格式,并且在开始工作之前不会将其全部加载到内存中。

Untested code below:

未经测试的代码如下:

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

XML::Twig->new( # call the sub for each VALUE element with a DECIMAL_VALUE attribute
                twig_roots => { 'VALUE[@DECIMAL_VALUE]' => \&upd_decimal },
                # print anything else as is
                twig_print_outside_roots => 1,
              )
         ->parsefile_inplace( 'foo.xml');

sub upd_decimal
  { my( $twig, $value)= @_; # twig is the XML::Twig object, $value the element
    my $decimal_value= $value->att( 'DECIMAL_VALUE');
    $decimal_value += 0.4;
    $value->set_att( DECIMAL_VALUE => $decimal_value);
    $value->print;
  }

#3


This takes input on stdin, outputs to stdout:

这需要输入stdin,输出到stdout:

while(<>){
 if( $_ =~ /^(.*DECIMAL_VALUE=\")(.*)(\".*)$/ ){
  $newVal = $2 + 0.04;
  print "$1$newVal$3\n";
 }else{
  print $_;
 }
}

#4


Something akin to the following will work. It may need tweaking if there is extra spacing, but that is left as an exercise for the reader.

类似于以下的东西将起作用。如果有额外的间距,可能需要调整,但这留给读者练习。

function update_after(in_string, locate_string, delta) {
    local_pos = index(in_string,locate_string);
    leadin    = substr(in_string,0,local_pos-1);
    leadout   = substr(in_string,local_pos+length(locate_string));
    new_value = leadout+delta;
    quote_pos = index(leadout,"\"");
    leadout   = substr(leadout, quote_pos + 1);
    return leadin locate_string new_value"\"" leadout;
}

/^ *\<VALUE/{
    print  update_after($0, "DECIMAL_VALUE=\"",0.4);
}

#5


here's gawk

awk '/DECIMAL_VALUE/{
 for(i=1;i<=NF;i++){
    if( $i~/DECIMAL_VALUE/){
        gsub(/DECIMAL_VALUE=|\042/,"",$i)
        $i="DECIMAL_VALUE=\042"$i+0.4"\042"
    }
 }
}1' file

#1


If you're on a box with the xsltproc command in place I would suggest you use XSLT for this.

如果你在xsltproc命令的盒子上,我建议你使用XSLT。

For a Perl solution I'd go for using the DOM. Check this DOM Processing with Perl article out.

对于Perl解决方案,我会选择使用DOM。使用Perl文章检查此DOM处理。

That said. If your XML file is produced in a predictable way something naïve like the following could work:

那就是说。如果您的XML文件以可预测的方式生成,那么天真如下的内容可能会起作用:

perl -pe 's#(<VALUE DECIMAL_VALUE=")([0-9.]+)(" UNIT_TYPE="percent"/>)#"$1" . ($2 + 0.4) . "$3"#e;'

#2


If you are absolutely sure that the format of your XML will never change, that the order of the attributes is fixed, that you can indeed get the regexp for the number right... then go for the non-parser based solution.

如果您完全确定XML的格式永远不会改变,那么属性的顺序是固定的,那么您确实可以获得正确数字的正则表达式...然后选择基于非解析器的解决方案。

Personally I would use XML::Twig (maybe because I wrote it ;--). It will process the XML as XML, while still respecting the original format of the file, and won't load it all in memory before starting to work.

我个人会使用XML :: Twig(也许是因为我写了它; - )。它将XML作为XML处理,同时仍然尊重文件的原始格式,并且在开始工作之前不会将其全部加载到内存中。

Untested code below:

未经测试的代码如下:

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

XML::Twig->new( # call the sub for each VALUE element with a DECIMAL_VALUE attribute
                twig_roots => { 'VALUE[@DECIMAL_VALUE]' => \&upd_decimal },
                # print anything else as is
                twig_print_outside_roots => 1,
              )
         ->parsefile_inplace( 'foo.xml');

sub upd_decimal
  { my( $twig, $value)= @_; # twig is the XML::Twig object, $value the element
    my $decimal_value= $value->att( 'DECIMAL_VALUE');
    $decimal_value += 0.4;
    $value->set_att( DECIMAL_VALUE => $decimal_value);
    $value->print;
  }

#3


This takes input on stdin, outputs to stdout:

这需要输入stdin,输出到stdout:

while(<>){
 if( $_ =~ /^(.*DECIMAL_VALUE=\")(.*)(\".*)$/ ){
  $newVal = $2 + 0.04;
  print "$1$newVal$3\n";
 }else{
  print $_;
 }
}

#4


Something akin to the following will work. It may need tweaking if there is extra spacing, but that is left as an exercise for the reader.

类似于以下的东西将起作用。如果有额外的间距,可能需要调整,但这留给读者练习。

function update_after(in_string, locate_string, delta) {
    local_pos = index(in_string,locate_string);
    leadin    = substr(in_string,0,local_pos-1);
    leadout   = substr(in_string,local_pos+length(locate_string));
    new_value = leadout+delta;
    quote_pos = index(leadout,"\"");
    leadout   = substr(leadout, quote_pos + 1);
    return leadin locate_string new_value"\"" leadout;
}

/^ *\<VALUE/{
    print  update_after($0, "DECIMAL_VALUE=\"",0.4);
}

#5


here's gawk

awk '/DECIMAL_VALUE/{
 for(i=1;i<=NF;i++){
    if( $i~/DECIMAL_VALUE/){
        gsub(/DECIMAL_VALUE=|\042/,"",$i)
        $i="DECIMAL_VALUE=\042"$i+0.4"\042"
    }
 }
}1' file