Ruby:如何将正则表达式拆分为多个行?

I have a 141 characters long regular expression in my Rails application and Rubocop doesn't like it.

我的Rails应用程序中有一个141个字符长的正则表达式，Rubocop不喜欢它。

My regular expression:

我的正则表达式:

URL_REGEX = /\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/

This pattern checks for urls & one level path e.g. http(s)://example.com/path

此模式检查url和一个级别的路径，例如http(s)://example.com/path。

Can you safely split a regular expression in Ruby? What is the general mechanism for splitting a regular expression in Ruby?

您能安全地在Ruby中拆分正则表达式吗?在Ruby中分割正则表达式的一般机制是什么?
How do you tell Rubocop to take it easy on regular expressions?

如何告诉Rubocop在正则表达式上放松点?

Thanks a lot!

谢谢!

4 个解决方案

#1

You should try something like this:

你应该试试这样的:

regexp = %r{\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+
            ([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[\w.]+)\z}x

if 'http://example.com/path' =~ regexp
  puts 'matches'
end

The "x" at the end is to ignore whitespace and comments in the pattern.

最后的“x”是在模式中忽略空白和注释。

Check the ruby style guide last example https://github.com/github/rubocop-github/blob/master/STYLEGUIDE.md#regular-expressions

检查ruby风格指南最后一个示例https://github.com/github/rubocop-github/blob/master/STYLEGUIDE.md#常规表达式

#2

How do you tell Rubocop to take it easy on regular expressions?

如何告诉Rubocop在正则表达式上放松点?

The cop that is complaining about this is likely Metrics/LineLength. There is no configuration option to ignore regular expressions, but you can inline disable it if you are okay with the regexp being that long:

抱怨这一点的警察很有可能是标准/线人。没有配置选项可以忽略正则表达式，但是如果您对regexp很满意，那么可以内联禁用它:

# rubocop:disable Metrics/LineLength
URL_REGEX = /\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/
# rubocop:enable Metrics/LineLength

It is also possible to put just a trailing rubocop:disable at the end of the line, but since the line is already very long, it could easily be missed, so the enable-disable combo might be better here.

也可以在行尾添加一个末尾的rubocop:disable，但是因为行已经很长了，所以很容易被忽略，所以启用-禁用组合在这里可能更好。

#3

Yes. you can create parts of regexes, and use them within the final regex you want.

是的。您可以创建regex的部分，并在您想要的最终regex中使用它们。

prefix = %w(http://www. https://www. https://)
prefix = Regexp.union(*prefix.map{|e| Regexp.escape(e)})
letters = "[a-z\d]+"
URL_REGEX = /\A(#{prefix})?#{letters}([-.]#{letters)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-.\w]+)\z/

#4

Another option would be to use a more concise regex. There are several places where you are repeating patterns when you don't need to.

另一个选择是使用更简洁的regex。有几个地方在不需要的时候重复模式。

/\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   (https?:\/\/(www.)?)?

With that and a few more alterations, I got your regex down to:

有了这些，再做一些改动，我把您的regex简化为:

/^(https?:\/\/(www.)?)?[-a-z0-9.]+\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)$/

It's not exactly equivalent, but here's my test.

这不是完全等价的，但这是我的测试。

#1