Ruby one-liner用于捕获正则表达式匹配

In Perl, I use the following one line statements to pull matches out of a string via regular expressions and assign them. This one finds a single match and assigns it to a string:

在Perl中，我使用以下一行语句通过正则表达式从字符串中提取匹配项并分配它们。这个找到一个匹配并将其分配给一个字符串：

my $string = "the quick brown fox jumps over the lazy dog.";

my $extractString = ($string =~ m{fox (.*?) dog})[0];

Result: $extractString == 'jumps over the lazy'

结果：$ extractString =='跳过懒惰'

And this one creates an array from multiple matches:

这个从多个匹配创建一个数组：

my $string = "the quick brown fox jumps over the lazy dog.";

my @extractArray = $string =~ m{the (.*?) fox .*?the (.*?) dog};

Result: @extractArray == ['quick brown', 'lazy']

结果：@extractArray == ['quick brown'，'lazy']

Is there an equivalent way to create these one-liners in Ruby?

是否有相同的方法在Ruby中创建这些单行？

3 个解决方案

#1

string = "the quick brown fox jumps over the lazy dog."

extract_string = string[/fox (.*?) dog/, 1]
# => "jumps over the lazy"

extract_array = string.scan(/the (.*?) fox .*?the (.*?) dog/).first
# => ["quick brown", "lazy"]

This approach will also return nil (instead of throwing an error) if no match is found.

如果未找到匹配项，此方法也将返回nil（而不是抛出错误）。

extract_string = string[/MISSING_CAT (.*?) dog/, 1]
# => nil

extract_array = string.scan(/the (.*?) MISSING_CAT .*?the (.*?) dog/).first
# => nil

#2

Use String#match and MatchData#[] or MatchData#captures to get matched backreferences.

使用String＃match和MatchData＃[]或MatchData＃capture来获得匹配的反向引用。

s = "the quick brown fox jumps over the lazy dog."

s.match(/fox (.*?) dog/)[1]
# => "jumps over the lazy"
s.match(/fox (.*?) dog/).captures
# => ["jumps over the lazy"]

s.match(/the (.*?) fox .*?the (.*?) dog/)[1..2]
# => ["quick brown", "lazy"]
s.match(/the (.*?) fox .*?the (.*?) dog/).captures
# => ["quick brown", "lazy"]

UPDATE

UPDATE

To avoid undefined method [] error:

为了避免未定义的方法[]错误：

(s.match(/fox (.*?) cat/) || [])[1]
# => nil
(s.match(/the (.*?) fox .*?the (.*?) cat/) || [])[1..2]
# => nil
(s.match(/the (.*?) fox .*?the (.*?) cat/) || [])[1..-1] # instead of .captures
# => nil

#3

First, be careful thinking in Perl terms when writing in Ruby. We do things a bit more verbosely to make the code more readable.

首先，在用Ruby编写时要小心用Perl术语思考。我们更冗长地做一些事情来使代码更具可读性。

I'd write my @extractArray = $string =~ m{the (.*?) fox .*?the (.*?) dog}; as:

我写了@extractArray = $ string = ~m {（。*？）狐狸。*？（。*？）狗};如：

string = "the quick brown fox jumps over the lazy dog."

string[/the (.*?) fox .*?the (.*?) dog/]
extract_array = $1, $2
# => ["quick brown", "lazy"]

Ruby, like Perl, is aware of the capture groups, and assigns them to values $1, $2, etc. Those make it very clean and clear when grabbing values and assigning them later. The regex engine lets you create and assign named captures also, but they tend to obscure what's happening, so, for clarity, I tend to go this way.

像Perl一样，Ruby知道捕获组，并将它们分配给值$ 1，$ 2等。这些使得它在获取值并稍后分配时非常干净和清晰。正则表达式引擎也允许您创建和分配命名捕获，但它们往往会模糊正在发生的事情，所以，为了清楚起见，我倾向于这样做。

We can use match to get there too:

我们也可以使用匹配到达那里：

/the (.*?) fox .*?the (.*?) dog/.match(string) # => #<MatchData "the quick brown fox jumps over the lazy dog" 1:"quick brown" 2:"lazy">

but is the end result more readable?

但最终结果更具可读性吗？

extract_array = /the (.*?) fox .*?the (.*?) dog/.match(string)[1..-1] 
# => ["quick brown", "lazy"]

The named captures are interesting too:

命名的捕获也很有趣：

/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/ =~ string
quick_brown # => "quick brown"
lazy # => "lazy"

But they result in wondering where those variables were initialized and assigned; I sure don't look in regular expressions for those to occur, so it's potentially confusing to others, and becomes a maintenance issue again.

但是他们会想知道这些变量在哪里被初始化和分配;我确定不会查看正则表达式，因此它可能会让其他人感到困惑，并再次成为维护问题。

Cary says:

卡里说：

To elaborate a little on named captures, if match_data = string.match /the (?.?) fox .?the (?.*?) dog/, then match_data[:quick_brown] # => "quick brown" and match_data[:lazy] # => "lazy" (as well as quick_brown # => "quick brown" and lazy # => "lazy"). With named captures available, I see no reason for using global variables or Regexp.last_match, etc.

要详细说明命名捕获，如果match_data = string.match /（？。？）fox。？（？。*？）dog /，则match_data [：quick_brown]＃=>“quick brown”和match_data [ ：懒惰]＃=>“懒惰”（以及quick_brown＃=>“快速褐色”和懒惰＃=>“懒惰”）。有了命名的捕获，我认为没有理由使用全局变量或Regexp.last_match等。

Yes, but there's some smell there too.

是的，但那里也有一些气味。

We can use values_at with the MatchData result of match to retrieve the values captured, but there are some unintuitive behaviors in the class that turn me off:

我们可以使用带有MatchData结果匹配的values_at来检索捕获的值，但是在类中有一些不直观的行为让我失望：

/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string)['lazy']

works, and implies that MatchData knows how to behave like a Hash:

工作，并暗示MatchData知道如何像哈希一样行事：

{'lazy' => 'dog'}['lazy'] # => "dog"

and it has a values_at method, like Hash, but it doesn't work intuitively:

并且它有一个values_at方法，如Hash，但它不直观地工作：

/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string).values_at('lazy') # => 
# ~> -:6:in `values_at': no implicit conversion of String into Integer (TypeError)

Whereas:

鉴于：

/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string).values_at(2) # => ["lazy"]

which now acts like an Array:

现在就像一个数组：

['all captures', 'quick brown', 'lazy'].values_at(2) # => ["lazy"]

I want consistency and this makes my head hurt.

我想要一致性，这让我头疼。

#1

string = "the quick brown fox jumps over the lazy dog."

extract_string = string[/fox (.*?) dog/, 1]
# => "jumps over the lazy"

extract_array = string.scan(/the (.*?) fox .*?the (.*?) dog/).first
# => ["quick brown", "lazy"]