如何匹配另一个字符串开头的数组中的字符串

I want to case-insensitively match a string from my array, TOKENS, at the beginning of another string followed by a space or the end of the line.

我希望不区分大小写地匹配我的数组中的字符串TOKENS,在另一个字符串的开头跟一个空格或行的结尾。

My tokens array looks like:

我的令牌数组看起来像:

2.4.0 :013 > TOKENS = ["m", "o"]
 => ["m", "o"]

When I try to match each element from my array, it is picking out the wrong results:

当我尝试匹配数组中的每个元素时,它会找出错误的结果:

2.4.0 :009 > data_col = ["M", "b", "Mabc", "abc m b"]
 => ["M", "b", "Mabc", "abc m b"]
...
2.4.0 :015 > data_col.select{|string| string =~ /^[#{Regexp.union(TOKENS)}]([[:space:]]|$)/i }
 => ["M", "b"]

This is matching both the "M" and the "b" entries even though "b" does not appear in my list of TOKENS. How do I modify my regular expression so that only the proper value, "M" will be matched?

这匹配“M”和“b”条目,即使“b”没有出现在我的TOKENS列表中。如何修改我的正则表达式,以便只匹配正确的值“M”?

I'm using Ruby 2.4.

我正在使用Ruby 2.4。

1 个解决方案

#1

I'd use:

TOKENS = ["m", "o"]
DATA_COL = ["M", "b", "Mabc", "abc m b"]
RE = /^(?:#{Regexp.union(TOKENS).source})(?: |$)/i

DATA_COL.select{ |string| string[RE] }
# => ["M"]

Breaking it down a bit:

打破一点:

Regexp.union(TOKENS).source # => "m|o"
/^(?:#{Regexp.union(TOKENS).source})(?: |$)/i # => /^(?:m|o)(?: |$)/i

/^[#{Regexp.union(TOKENS)}]([[:space:]]|$)/i isn't a good idea inside a loop. Each time through you force Ruby to create the pattern; Efficiency is important inside loops, especially big ones, so create the pattern outside the loop then refer to the pattern inside.

/^[#{Regexp.union(TOKENS)}]([[:space:]]|$)/i在循环中不是个好主意。每次通过你强迫Ruby创建模式;效率在循环内部很重要,尤其是大循环,因此在循环外创建模式然后参考内部模式。

The next problem is that Regexp.union has a concept of the correct case it should match:

接下来的问题是Regexp.union有一个它应该匹配的正确案例的概念:

Regexp.union(TOKENS).to_s        # => "(?-mix:m|o)"

The (?-mix: part is how the Regular Expression engine remembers the flags for the pattern. When the pattern is embedded inside another pattern they continue to know what they should look for, causing us to gnash our teeth and weep:

(?-mix:part是正则表达式引擎如何记住模式的标志。当模式嵌入到另一个模式中时,他们会继续知道他们应该寻找什么,导致我们咬牙切齿并哭泣:

/#{Regexp.union(TOKENS)}/i # => /(?-mix:m|o)/i

The trailing i is telling the pattern it should ignore case, but the embedded i is not set so it's honoring case. And that's what is breaking your pattern.

尾随i告诉它应该忽略大小写的模式,但嵌入式i没有设置,所以它是尊重案例。这就是破坏你的模式的原因。

The fix is to use source when embedding like I did above.

修复是在嵌入时使用源代码,如上所述。

See the Regex "options" section for more information.

有关详细信息,请参阅正则表达式“选项”部分。

#1