为什么这个Perl正则表达式不起作用?

时间:2021-12-14 21:44:10

I have a Perl script, that's supposed to match this string:

我有一个Perl脚本,应该匹配这个字符串:

Sometimes, he says "hey fred, what's up?"

有时,他说“嘿,弗雷德,什么事?”

It says if it found fred at the beginning, end, or middle of the word, or if it just found "fred". So it matches Alfred, and Frederich.

它说如果它在单词的开头,结尾或中间发现了fred,或者它只是发现了“fred”。所以它与Alfred和Frederich相匹配。

Well, in this string, it's supposed to say it found fred on its own, but it's saying it found it at the beginning of a word. Here is the regex for the beginning-of-word-fred, (it's in an if-elsif ladder going beginning of word, end of word, just fred, middle of word):

好吧,在这个字符串中,它应该说它自己发现了fred,但是它说它在一个单词的开头找到了它。这是fred开头的正则表达式(它是在if-elsif阶梯中开始的单词,单词结束,只是fred,单词的中间):

if(/.*\s+[fF][rR][eE][dD][^ \t\r\n,.:;'"].*/){
    print "found fred at beginning of a word:\n    $_\n";

I used [^ \t\r\n,.:;'"] instead of \S incase the word is followed by some punctuation. Obviously it's not an exhaustive list of punctuation, but it doesn't matter for this example since it's followed by a comma.

我使用[^ \ t \ r \ n,。:;'“]而不是\ S这个词之后是一些标点符号。显然它不是一个详尽的标点符号列表,但它对于这个例子并不重要,因为它是然后是逗号。

this is in a foreach loop... If it means anything, This is exercise 7-1 in Learning Perl 5th ed.

这是一个foreach循环...如果它意味着什么,这是Learning Perl 5th ed中的练习7-1。

update

the exercise in the book is to write a Perl program to find "fred" in a list of words. Then it asks, does the script find fred in "Frederich" or "Alfred?" And then it says to write a text file that talks about Fred Flinstone and his friends, and use it as an input to the script.

书中的练习是写一个Perl程序,在单词列表中找到“fred”。然后它问,脚本是否在“弗雷德里希”或“阿尔弗雷德?”中找到了弗雷德?然后它说写一个文本文件,谈论Fred Flinstone和他的朋友,并将其用作脚本的输入。

also

I figured it out, sort of: I must have changed something while writing the question that I forgot about: I tested it again and instead of matching the beginning of a word, it just said it found it anywhere. So the problems wasn't that it thought it was at the beginning of a word, it was that it thought it wasn't the only thing in the word. I added [,.:;'"]?\s+ to the code which matches "fred" as a whole word and it worked. I guess I should have thought about it a little more before asking :)

我想通了,有点:在写下我忘记的问题时,我必须改变一些东西:我再次测试它,而不是匹配一个单词的开头,它只是说它在任何地方找到它。所以问题并不在于它认为它只是在一个词的开头,而是它认为它不是单词中唯一的东西。我将[,。:;'“]?\ s +添加到与”fred“匹配的代码作为整个单词并且它有效。我想在问之前我应该​​多考虑一下:)

3 个解决方案

#1


if you want to match Fred and frederick but not Alfred, then your regex is:

如果你想匹配弗雷德和弗雷德里克而不是阿尔弗雷德,那么你的正则表达式是:

/\bfred\w*\b/i

That is to say: a word boundary followed by (case-insentitive) "fred" followed by zero or more word-characters, followed by another word boundary. If you just want frederick, but plain Fred is out, then:

也就是说:一个单词边界后跟(case-insentitive)“fred”后跟零个或多个单词字符,后跟另一个单词边界。如果你只想要弗雷德里克,但普通的弗雷德出局了,那么:

/\bfred\w+\b/i

i.e., word boundary, "fred", one or more word chars, word boundary.

即,单词边界,“fred”,一个或多个单词字符,单词边界。

UPDATE: re-reading your question, it seems that you want:

更新:重新阅读你的问题,似乎你想要:

perl -E '
use strict;
use warnings;
for( "nobody is here",
    "I am Frederick Flintsone",
    "she is alfredine",
    "I am Alfred Hitchcock",
    "fred has left the building" ) {
  say;
  if( ! /\b(\w*)fred(\w*)\b/i ) {
    say "no fred!"
  } elsif( ! length "$1$2" ) {
    say "fred by itself!"
  } elsif( ! length $2 ) {
    say "something-fred!"
  } elsif( ! length $1 ) {
    say "fred-something!"
  } else {
    say "something-fred-something!"
  }
}'

that outputs:

nobody is here
no fred!
I am Frederick Flintsone
fred-something!
she is alfredine
something-fred-something!
I am Alfred Hitchcock
something-fred!
fred has left the building
fred by itself!

#2


You can use \b for word boundaries and \w for word characters and also, the /i modifier for case insensitivity is cleaner than using [fF] etc.

您可以使用\ b作为单词边界,使用\ w作为单词字符,而且,对于不区分大小写的/ i修饰符比使用[fF]等更清晰。

Something like:

if ($st =~ m{\b fred \w+ }xi) {
    print "Found fred at the beginning of a word";
} else {
    print "Not found";
}

If you need to look for 'fred' as a word itself, then use \b fred \b.

如果你需要将'fred'作为一个单词本身,那么使用\ b fred \ b。

I'd recommend having a read of http://perldoc.perl.org/perlre.html

我建议您阅读http://perldoc.perl.org/perlre.html

#3


Are you sure it doesn't work? It looks fine for your example case, and a slightly adjusted version of your code that I just ran gave the expected answer:

你确定它不起作用吗?它看起来很适合您的示例案例,我刚刚运行的代码略微调整后的版本给出了预期的答案:

#!/usr/bin/perl

use strict; use warnings;

my $st = q{Sometimes, he says "hey fred, what's up?"};

foreach($st)
{
    if(/.*\s+[fF][rR][eE][dD][^ \t\r\n,.:;'"].*/){
        print "found fred at beginning of a word:\n    $_\n";
    }
    else
    {
        print "not found in $_";
    }
}

is reporting the 'not found' part (as expected, since I'm not doing the 'just fred' check).

正在报告“未找到”部分(正如预期的那样,因为我没有做'刚刚进行'检查)。

#1


if you want to match Fred and frederick but not Alfred, then your regex is:

如果你想匹配弗雷德和弗雷德里克而不是阿尔弗雷德,那么你的正则表达式是:

/\bfred\w*\b/i

That is to say: a word boundary followed by (case-insentitive) "fred" followed by zero or more word-characters, followed by another word boundary. If you just want frederick, but plain Fred is out, then:

也就是说:一个单词边界后跟(case-insentitive)“fred”后跟零个或多个单词字符,后跟另一个单词边界。如果你只想要弗雷德里克,但普通的弗雷德出局了,那么:

/\bfred\w+\b/i

i.e., word boundary, "fred", one or more word chars, word boundary.

即,单词边界,“fred”,一个或多个单词字符,单词边界。

UPDATE: re-reading your question, it seems that you want:

更新:重新阅读你的问题,似乎你想要:

perl -E '
use strict;
use warnings;
for( "nobody is here",
    "I am Frederick Flintsone",
    "she is alfredine",
    "I am Alfred Hitchcock",
    "fred has left the building" ) {
  say;
  if( ! /\b(\w*)fred(\w*)\b/i ) {
    say "no fred!"
  } elsif( ! length "$1$2" ) {
    say "fred by itself!"
  } elsif( ! length $2 ) {
    say "something-fred!"
  } elsif( ! length $1 ) {
    say "fred-something!"
  } else {
    say "something-fred-something!"
  }
}'

that outputs:

nobody is here
no fred!
I am Frederick Flintsone
fred-something!
she is alfredine
something-fred-something!
I am Alfred Hitchcock
something-fred!
fred has left the building
fred by itself!

#2


You can use \b for word boundaries and \w for word characters and also, the /i modifier for case insensitivity is cleaner than using [fF] etc.

您可以使用\ b作为单词边界,使用\ w作为单词字符,而且,对于不区分大小写的/ i修饰符比使用[fF]等更清晰。

Something like:

if ($st =~ m{\b fred \w+ }xi) {
    print "Found fred at the beginning of a word";
} else {
    print "Not found";
}

If you need to look for 'fred' as a word itself, then use \b fred \b.

如果你需要将'fred'作为一个单词本身,那么使用\ b fred \ b。

I'd recommend having a read of http://perldoc.perl.org/perlre.html

我建议您阅读http://perldoc.perl.org/perlre.html

#3


Are you sure it doesn't work? It looks fine for your example case, and a slightly adjusted version of your code that I just ran gave the expected answer:

你确定它不起作用吗?它看起来很适合您的示例案例,我刚刚运行的代码略微调整后的版本给出了预期的答案:

#!/usr/bin/perl

use strict; use warnings;

my $st = q{Sometimes, he says "hey fred, what's up?"};

foreach($st)
{
    if(/.*\s+[fF][rR][eE][dD][^ \t\r\n,.:;'"].*/){
        print "found fred at beginning of a word:\n    $_\n";
    }
    else
    {
        print "not found in $_";
    }
}

is reporting the 'not found' part (as expected, since I'm not doing the 'just fred' check).

正在报告“未找到”部分(正如预期的那样,因为我没有做'刚刚进行'检查)。