我需要将XML节点元素提取到CSV中

时间:2021-11-20 01:52:56

=XML FILE=

XML文件= =

    <?xml version="1.0" encoding="utf-8"?>
<weatherdata>
<location>
<name>Toronto</name>
<type/>
<country>CA</country>
<timezone/>
<location altitude="0" latitude="43.700111" longitude="-79.416298" geobase="geonames" geobaseid="0"/></location>
<credit/>
 <meta>
<lastupdate/>
  <calctime>1.4906</calctime>
<nextupdate/>
 </meta>
 <sun rise="2015-02-17T12:12:32" set="2015-02-17T22:50:42"/>
<forecast>
<time from="2015-02-17T15:00:00" to="2015-02-17T18:00:00">
  <symbol number="803" name="broken clouds" var="04d"/>
  <precipitation/>
  <windDirection deg="43.5048" code="NE" name="NorthEast"/>
  <windSpeed mps="1.82" name="Light breeze"/>
  <temperature unit="celsius" value="-13.29" min="-13.293" max="-13.29"/>
  <pressure unit="hPa" value="1007.77"/>
  <humidity value="100" unit="%"/>
  <clouds value="broken clouds" all="64" unit="%"/>
</time>
<time from="2015-02-17T18:00:00" to="2015-02-17T21:00:00">
  <symbol number="803" name="broken clouds" var="04d"/>
  <precipitation/>
  <windDirection deg="255.501" code="WSW" name="West-southwest"/>
  <windSpeed mps="0.66" name="Calm"/>
  <temperature unit="celsius" value="-10.16" min="-10.16" max="-10.16"/>
  <pressure unit="hPa" value="1006.44"/>
  <humidity value="100" unit="%"/>
  <clouds value="broken clouds" all="80" unit="%"/>
</time>

= DUMPER EXTRACT =

=翻车机提取=

  'att' => {
                                                                                                                                                                                                                                                              'to' => '2015-02-22T00:00:00',
                                                                                                                                                                                                                                              'from' => '2015-02-21T21:00:00'


  'att' => {
                                                                                                                                                                                                                                                                                                'value' => '100',
                                                                                                                                                                                                                                                                                                'unit' => '%'


    'next_sibling' => $VAR1->{'twig_root'}{'first_child'}{'next_sibling' }     {'next_sibling'}{'next_sibling'}{'next_sibling'}{'last_child'}{'prev_sibling'}  {'last_child'}{'prev_sibling'},
                                                                                                                                                                                                                                                                                                                      'att' => {
                                                                                                                                                                                                                                                                                                                            'unit' => 'hPa',
                                                                                                                                                                                                                                                                                                                             'value' => '1020.87'


  'prev_sibling' => bless( {
                                                                                                                                                                                                                                                                                                                                             'att' => {
                                                                                                                                                                                                                                                                                                                                                       'min' => '-8.313',
                                                                                                                                                                                                                                                                                                                                                      'max' => '-8.313',
                                                                                                                                                                                                                                                                                                                                                      'unit' => 'celsius',

I am looking to extract from the XML file:

我希望从XML文件中提取:

'from' (only the time) 'humidity value' (the value) 'temperature max' (the temp value) 'temperature min' (the temp value) 'pressure value' (the hpA value)

“从”(只)的湿度值(值)“温度最大值”(温度值)“温度最小值”(温度值)“压力值”(hpA值)

The code below was my draft code to see if I was on the right track. The intention was to get it working with a few nodes; outputting it to a CSV file. I am not getting anywhere...

下面的代码是我的草稿代码,看看我是否走对了方向。目的是让它与几个节点一起工作;输出到CSV文件。我哪儿也去不了……

= PERL CODE =

= PERL代码=

use strict;

use Data::Dumper;
use XML::Simple 'XMLin';

my $input_xml = '/var/egridmanage_pl/data/longrange.xml' ||die $!;
my $output_csv = '/var/egridmanage_pl/longrange.csv';

my $parse = XMLin('/var/egridmanage_pl/data/longrange.xml',forcearray => ['value']);

foreach my $dataset (@{$parse->{weatherdata}}) {
if ($dataset->{name} eq 'Toronto') {
    open my $out, ">", $output_csv or die "Could not open $output_csv: $!";
print {$out} $dataset->{att}-> {from} . "\n";
print {$out} $dataset->{att}->[0]->{value} . "\n";
}
}

= INTENDED RESULTS WOULD BE THE FOLLOWING = (I NEED HELP!!)

=预期结果如下=(我需要帮助!!)

time      | humidity |  hPa    | min    | max    |

15:00:00  | 100      | 1007.77 | -13.29 | -13.29 | 

2 个解决方案

#1


4  

Let me suggest something radically different. Since your input is an XML document, you could use XSLT to extract data from it.

让我提出一些截然不同的观点。由于输入是XML文档,所以可以使用XSLT从其中提取数据。

Your Perl code would then consist of executing this transformation, everything else would be handled in an XSLT stylesheet. You'd have to use a library that includes an XSLT processor, and in my opinion, using LibXML and LibXSLT would be the safest way (sample code taken from here):

您的Perl代码将包括执行这个转换,其他的一切都将在XSLT样式表中处理。您必须使用包含XSLT处理器的库,在我看来,使用LibXML和LibXSLT将是最安全的方式(从这里获取的示例代码):

use XML::LibXSLT;
use XML::LibXML;

my $xslt = XML::LibXSLT->new();

my $source = XML::LibXML->load_xml(location => 'foo.xml');
my $style_doc = XML::LibXML->load_xml(location=>'bar.xsl', no_cdata=>1);

my $stylesheet = $xslt->parse_stylesheet($style_doc);

my $results = $stylesheet->transform($source);

print $stylesheet->output_as_bytes($results);

Assuming a well-formed input XML, use the following transformation.

假设输入XML格式良好,使用以下转换。

XSLT Stylesheet

XSLT样式表

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="UTF-8" />

    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:text>time|humidity|hPa|min|max|&#xA;</xsl:text>
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="time">
        <xsl:value-of select="concat(@from,'|',humidity/@value, '|', pressure/@value,'|', temperature/@min, '|', temperature/@max, '|')"/>
        <xsl:if test="following::time">
            <xsl:text>&#xA;</xsl:text>
        </xsl:if>
    </xsl:template>

    <xsl:template match="text()"/>

</xsl:transform>

Text Output

文本输出

time|humidity|hPa|min|max|
2015-02-17T15:00:00|100|1007.77|-13.293|-13.29|
2015-02-17T18:00:00|100|1006.44|-10.16|-10.16|

#2


2  

Whilst you have an answer, you've tagged it as Perl, so I'll contribute something perlish. First off though - don't use XML::Simple. From it's docs:

虽然您已经有了答案,但您已经将它标记为Perl,因此我将提供一些具有perlish的内容。首先,不要使用XML::Simple。从文档:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

不建议在新代码中使用此模块。其他模块提供了更简单和一致的接口。

Personally, I like XML::Twig when it comes to XML parsing. It goes something like this: (note - I've cut down your XML, because yours is incomplete and thus invalid. But that shouldn't matter, because this code is only inspecting the <time> elements)

就我个人而言,在XML解析方面,我喜欢XML::Twig。它是这样的:(注意——我已经减少了您的XML,因为您的XML是不完整的,因此无效。但这不重要,因为这段代码只检查

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

sub time_handler {
    my ( $twig, $time ) = @_;

    print join( "\t",
        $time->att('from'),
        $time->first_child('humidity')->att('value'),
        $time->first_child('pressure')->att('value'),
        $time->first_child('temperature')->att('min'),
        $time->first_child('temperature')->att('max'),
        "\n" );

    #will discard data as you go, saving memory footprint.  
    $twig -> purge;    
}

local $/;
my $parser = XML::Twig->new( twig_handlers => { 'time' => \&time_handler } )
    ->parse(<DATA>);

__DATA__
<?xml version="1.0" encoding="utf-8"?>
<weatherdata>
<time from="2015-02-17T15:00:00" to="2015-02-17T18:00:00">
  <symbol number="803" name="broken clouds" var="04d"/>
  <precipitation/>
  <windDirection deg="43.5048" code="NE" name="NorthEast"/>
  <windSpeed mps="1.82" name="Light breeze"/>
  <temperature unit="celsius" value="-13.29" min="-13.293" max="-13.29"/>
  <pressure unit="hPa" value="1007.77"/>
  <humidity value="100" unit="%"/>
  <clouds value="broken clouds" all="64" unit="%"/>
</time>
<time from="2015-02-17T18:00:00" to="2015-02-17T21:00:00">
  <symbol number="803" name="broken clouds" var="04d"/>
  <precipitation/>
  <windDirection deg="255.501" code="WSW" name="West-southwest"/>
  <windSpeed mps="0.66" name="Calm"/>
  <temperature unit="celsius" value="-10.16" min="-10.16" max="-10.16"/>
  <pressure unit="hPa" value="1006.44"/>
  <humidity value="100" unit="%"/>
  <clouds value="broken clouds" all="80" unit="%"/>
</time>
</weatherdata>

#1


4  

Let me suggest something radically different. Since your input is an XML document, you could use XSLT to extract data from it.

让我提出一些截然不同的观点。由于输入是XML文档,所以可以使用XSLT从其中提取数据。

Your Perl code would then consist of executing this transformation, everything else would be handled in an XSLT stylesheet. You'd have to use a library that includes an XSLT processor, and in my opinion, using LibXML and LibXSLT would be the safest way (sample code taken from here):

您的Perl代码将包括执行这个转换,其他的一切都将在XSLT样式表中处理。您必须使用包含XSLT处理器的库,在我看来,使用LibXML和LibXSLT将是最安全的方式(从这里获取的示例代码):

use XML::LibXSLT;
use XML::LibXML;

my $xslt = XML::LibXSLT->new();

my $source = XML::LibXML->load_xml(location => 'foo.xml');
my $style_doc = XML::LibXML->load_xml(location=>'bar.xsl', no_cdata=>1);

my $stylesheet = $xslt->parse_stylesheet($style_doc);

my $results = $stylesheet->transform($source);

print $stylesheet->output_as_bytes($results);

Assuming a well-formed input XML, use the following transformation.

假设输入XML格式良好,使用以下转换。

XSLT Stylesheet

XSLT样式表

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="UTF-8" />

    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:text>time|humidity|hPa|min|max|&#xA;</xsl:text>
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="time">
        <xsl:value-of select="concat(@from,'|',humidity/@value, '|', pressure/@value,'|', temperature/@min, '|', temperature/@max, '|')"/>
        <xsl:if test="following::time">
            <xsl:text>&#xA;</xsl:text>
        </xsl:if>
    </xsl:template>

    <xsl:template match="text()"/>

</xsl:transform>

Text Output

文本输出

time|humidity|hPa|min|max|
2015-02-17T15:00:00|100|1007.77|-13.293|-13.29|
2015-02-17T18:00:00|100|1006.44|-10.16|-10.16|

#2


2  

Whilst you have an answer, you've tagged it as Perl, so I'll contribute something perlish. First off though - don't use XML::Simple. From it's docs:

虽然您已经有了答案,但您已经将它标记为Perl,因此我将提供一些具有perlish的内容。首先,不要使用XML::Simple。从文档:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

不建议在新代码中使用此模块。其他模块提供了更简单和一致的接口。

Personally, I like XML::Twig when it comes to XML parsing. It goes something like this: (note - I've cut down your XML, because yours is incomplete and thus invalid. But that shouldn't matter, because this code is only inspecting the <time> elements)

就我个人而言,在XML解析方面,我喜欢XML::Twig。它是这样的:(注意——我已经减少了您的XML,因为您的XML是不完整的,因此无效。但这不重要,因为这段代码只检查

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

sub time_handler {
    my ( $twig, $time ) = @_;

    print join( "\t",
        $time->att('from'),
        $time->first_child('humidity')->att('value'),
        $time->first_child('pressure')->att('value'),
        $time->first_child('temperature')->att('min'),
        $time->first_child('temperature')->att('max'),
        "\n" );

    #will discard data as you go, saving memory footprint.  
    $twig -> purge;    
}

local $/;
my $parser = XML::Twig->new( twig_handlers => { 'time' => \&time_handler } )
    ->parse(<DATA>);

__DATA__
<?xml version="1.0" encoding="utf-8"?>
<weatherdata>
<time from="2015-02-17T15:00:00" to="2015-02-17T18:00:00">
  <symbol number="803" name="broken clouds" var="04d"/>
  <precipitation/>
  <windDirection deg="43.5048" code="NE" name="NorthEast"/>
  <windSpeed mps="1.82" name="Light breeze"/>
  <temperature unit="celsius" value="-13.29" min="-13.293" max="-13.29"/>
  <pressure unit="hPa" value="1007.77"/>
  <humidity value="100" unit="%"/>
  <clouds value="broken clouds" all="64" unit="%"/>
</time>
<time from="2015-02-17T18:00:00" to="2015-02-17T21:00:00">
  <symbol number="803" name="broken clouds" var="04d"/>
  <precipitation/>
  <windDirection deg="255.501" code="WSW" name="West-southwest"/>
  <windSpeed mps="0.66" name="Calm"/>
  <temperature unit="celsius" value="-10.16" min="-10.16" max="-10.16"/>
  <pressure unit="hPa" value="1006.44"/>
  <humidity value="100" unit="%"/>
  <clouds value="broken clouds" all="80" unit="%"/>
</time>
</weatherdata>