如何使用perl搜索xml文件以查找特定的字符串?

时间:2022-06-01 19:00:35
$characterString = $verb[2];
$inputFile = $targetdirectory."/ppt/slides/slide".$slidenumber.".xml";

open FILE, "<$inputFile>";
  for (@lines) {
  if ($_ =~ /$characterString/) {
    print "Matched $characterString \n ";
  } else {
    print "Did not match $characterString\n";
}
}
close FILE;

Here is a sample from the XML file:

下面是XML文件中的一个示例:

<a:t>Bailey</a:t></a:r></a:p><a:p><a:pPr lvl="1"><a:lnSpc><a:spcPct val="90000"/>

Here is the output:

这是输出:

PUB ENGINE: Version 5-26-2015
Did not match billybob
Did not match Bailey

Bailey is in the xml file, but billybob is not

Bailey在xml文件中,但是billybob不是

2 个解决方案

#1


3  

The first two major issues:

前两个主要问题:

  1. You are trying to open a file whose name ends with .xml>.

    您正在尝试打开一个文件名以.xml>结尾的文件。

    open FILE, "<$inputFile>";
    

    should be

    应该是

    open FILE, "<$inputFile";
    

    Well, not really. It should be

    好吧,其实不是。它应该是

    open(my $FILE, '<', $inputFile)
       or die("Can't open \"$inputFile\": $!\n");
    

    This avoids the use of global vars, this avoids the file name from being treated as anything but a file name, and this checks if the open succeeded (being a common point of failure).

    这避免了全局vars的使用,避免了文件名被当作文件名以外的任何东西,并检查open是否成功(这是一个常见的失败点)。

  2. You never read from the file handle.

    您从不从文件句柄中读取。

    for (@lines) {
    

    should be

    应该是

    while (<FILE>) {
    

    Or if you adopted my suggested change,

    或者如果你采纳了我的建议,

    while (<$FILE>) {
    

#2


2  

I would suggest that you're taking the wrong approach. XML doesn't parse well with line and regex based parsing - there's a variety of ways to create semantically identical XML that doesn't match the same regular expressions.

我建议你采取错误的方法。XML不能很好地解析基于线和正则表达式的解析——有很多方法可以创建语义相同的XML,而这些XML与相同的正则表达式不匹配。

I've had to adjust your XML a little too, because it's not valid. I am assuming that because you mention 'sample' that your XML is valid. For reference - it's useful to provide sample XML that's valid - which means all the tags open/close.

我也不得不调整你的XML,因为它是无效的。我假设,因为您提到了“sample”,所以您的XML是有效的。作为参考,提供有效的示例XML非常有用,这意味着所有的标记都是打开/关闭的。

So I'm using this:

所以我用这个:

<root>
  <a:r>
    <a:p>
      <a:t>Bailey</a:t>
    </a:p>
  </a:r>
  <a:p>
    <a:pPr lvl="1">
      <a:lnSpc>
        <a:spcPct val="90000" />
      </a:lnSpc>
    </a:pPr>
  </a:p>
</root>

Note this can be written in a variety of ways:

注意,可以用多种方式写:

<root
><a:r
><a:p
><a:t
>Bailey</a:t></a:p></a:r><a:p
><a:pPr
lvl="1"
><a:lnSpc
><a:spcPct
val="90000"
/></a:lnSpc></a:pPr></a:p></root>

Or:

或者:

<root><a:r><a:p><a:t>Bailey</a:t></a:p></a:r><a:p><a:pPr lvl="1"><a:lnSpc><a:spcPct val="90000"/></a:lnSpc></a:pPr></a:p></root>

All of which mean the same - and hopefully illustrates why using line based parsing is a bad idea. This may not entirely apply to your use case, but I'm a firm believer that using an XML parser whenever XML is involved is no bad thing.

所有这些都意味着相同的意思——希望能说明为什么使用基于线的解析是一个坏主意。这可能并不完全适用于您的用例,但我坚信,在涉及XML的时候使用XML解析器并不是件坏事。

Anyway - finding elements.

无论如何,找到元素。

#!/usr/bin/perl 
use strict;
use warnings;

use XML::Twig;

my $search = 'Bailey';

my $found;
XML::Twig->new(
    twig_handlers => {
        '_all_' => sub { $found++ if $_->text =~ m/$search/ }
    }
)->parsefile($inputFile); 

if ($found) {
    print "Found $search\n";
}
else {
    print "Didn't find $search\n";
}

Note - only 'finds' the keywords in the text of the XML, rather than in any of the attributes. This is usually more desirable than just blind matching XML structure/attributes/content.

注意——只在XML文本中“查找”关键字,而不是在任何属性中。这通常比盲目匹配XML结构/属性/内容更可取。

#1


3  

The first two major issues:

前两个主要问题:

  1. You are trying to open a file whose name ends with .xml>.

    您正在尝试打开一个文件名以.xml>结尾的文件。

    open FILE, "<$inputFile>";
    

    should be

    应该是

    open FILE, "<$inputFile";
    

    Well, not really. It should be

    好吧,其实不是。它应该是

    open(my $FILE, '<', $inputFile)
       or die("Can't open \"$inputFile\": $!\n");
    

    This avoids the use of global vars, this avoids the file name from being treated as anything but a file name, and this checks if the open succeeded (being a common point of failure).

    这避免了全局vars的使用,避免了文件名被当作文件名以外的任何东西,并检查open是否成功(这是一个常见的失败点)。

  2. You never read from the file handle.

    您从不从文件句柄中读取。

    for (@lines) {
    

    should be

    应该是

    while (<FILE>) {
    

    Or if you adopted my suggested change,

    或者如果你采纳了我的建议,

    while (<$FILE>) {
    

#2


2  

I would suggest that you're taking the wrong approach. XML doesn't parse well with line and regex based parsing - there's a variety of ways to create semantically identical XML that doesn't match the same regular expressions.

我建议你采取错误的方法。XML不能很好地解析基于线和正则表达式的解析——有很多方法可以创建语义相同的XML,而这些XML与相同的正则表达式不匹配。

I've had to adjust your XML a little too, because it's not valid. I am assuming that because you mention 'sample' that your XML is valid. For reference - it's useful to provide sample XML that's valid - which means all the tags open/close.

我也不得不调整你的XML,因为它是无效的。我假设,因为您提到了“sample”,所以您的XML是有效的。作为参考,提供有效的示例XML非常有用,这意味着所有的标记都是打开/关闭的。

So I'm using this:

所以我用这个:

<root>
  <a:r>
    <a:p>
      <a:t>Bailey</a:t>
    </a:p>
  </a:r>
  <a:p>
    <a:pPr lvl="1">
      <a:lnSpc>
        <a:spcPct val="90000" />
      </a:lnSpc>
    </a:pPr>
  </a:p>
</root>

Note this can be written in a variety of ways:

注意,可以用多种方式写:

<root
><a:r
><a:p
><a:t
>Bailey</a:t></a:p></a:r><a:p
><a:pPr
lvl="1"
><a:lnSpc
><a:spcPct
val="90000"
/></a:lnSpc></a:pPr></a:p></root>

Or:

或者:

<root><a:r><a:p><a:t>Bailey</a:t></a:p></a:r><a:p><a:pPr lvl="1"><a:lnSpc><a:spcPct val="90000"/></a:lnSpc></a:pPr></a:p></root>

All of which mean the same - and hopefully illustrates why using line based parsing is a bad idea. This may not entirely apply to your use case, but I'm a firm believer that using an XML parser whenever XML is involved is no bad thing.

所有这些都意味着相同的意思——希望能说明为什么使用基于线的解析是一个坏主意。这可能并不完全适用于您的用例,但我坚信,在涉及XML的时候使用XML解析器并不是件坏事。

Anyway - finding elements.

无论如何,找到元素。

#!/usr/bin/perl 
use strict;
use warnings;

use XML::Twig;

my $search = 'Bailey';

my $found;
XML::Twig->new(
    twig_handlers => {
        '_all_' => sub { $found++ if $_->text =~ m/$search/ }
    }
)->parsefile($inputFile); 

if ($found) {
    print "Found $search\n";
}
else {
    print "Didn't find $search\n";
}

Note - only 'finds' the keywords in the text of the XML, rather than in any of the attributes. This is usually more desirable than just blind matching XML structure/attributes/content.

注意——只在XML文本中“查找”关键字,而不是在任何属性中。这通常比盲目匹配XML结构/属性/内容更可取。