在regex中忽略一个字符以及单词边界

时间:2022-02-03 20:46:03

I am using gsub in Ruby to make a word within text bold. I am using a word boundary so as to not make letters within other words bold, but am finding that this ignores words that have a quote after them. For example:

我在Ruby中使用gsub来在文本粗体中做一个单词。我使用的是一个单词边界,以便不让字母在其他单词中加粗,但我发现这忽略了后面引用的单词。例如:

text.gsub(/#{word}\b/i, "<b>#{word}</b>")

text = "I said, 'look out below'"
word = below

In this case the word below is not made bold. Is there any way to ignore certain characters along with a word boundary?

在这种情况下,下面的词没有加粗。有什么方法可以忽略某些字符以及一个词的边界吗?

3 个解决方案

#1


2  

All that escaping in the Regexp.new is looking quite ugly. You could greatly simplify that by using a Regexp literal:

所有在Regexp中转义的。新的看起来很丑。您可以通过使用Regexp文字来极大地简化它:

word = 'below'
text = "I said, 'look out below'"

reg = /\b#{word}\b/i
text.gsub!(reg, '<b>\0</b>')

Also, you could use the modifier form of gsub! directly, unless that string is aliased in some other place in your code that you are not showing us. Lastly, if you use the single quoted string literal inside your gsub call, you don't need to escape the backslash.

你也可以使用gsub的修饰语形式!直接地,除非那个字符串在您的代码中没有显示给我们的其他地方别名。最后,如果在gsub调用中使用单引号字符串文字,则不需要转义反斜杠。

#2


2  

Be very careful with your \b boundaries. Here’s why.

小心你的底线。这是为什么。

#3


0  

The #{word} syntax doesn't work for regular expressions. Use Regexp.new instead:

#{word}语法不适用于正则表达式。使用正则表达式。新:

word = "below"
text = "I said, 'look out below'"

reg = Regexp.new("\\b#{word}\\b", true)
text = text.gsub(reg, "<b>\\0</b>")

Note that when using sting you need to escape \b to \\b, or it is interpreted as a backspace. If word may contain special regex characters, escape it using Regexp.escape.

注意,当使用sting时,你需要从b到\\b,或者它被解释为一个退格。如果单词可能包含特殊的regex字符,则使用Regexp.escape来避免它。

Also, by replacing the string to <b>#{word}</b> you may change casing of the string: "BeloW" will be replaced to "below". \0 corrects this by replacing with the found word. In addition, I added \\b at the beginning, you don't want to look for "day" and end up with "sunday".

另外,通过将字符串替换为#{word},您可以更改字符串的大小写:"BeloW"将被替换为"BeloW"。\0通过替换已找到的单词来纠正这个错误。此外,我在开始的时候添加了\b,你不想找“一天”而以“星期日”结束。

#1


2  

All that escaping in the Regexp.new is looking quite ugly. You could greatly simplify that by using a Regexp literal:

所有在Regexp中转义的。新的看起来很丑。您可以通过使用Regexp文字来极大地简化它:

word = 'below'
text = "I said, 'look out below'"

reg = /\b#{word}\b/i
text.gsub!(reg, '<b>\0</b>')

Also, you could use the modifier form of gsub! directly, unless that string is aliased in some other place in your code that you are not showing us. Lastly, if you use the single quoted string literal inside your gsub call, you don't need to escape the backslash.

你也可以使用gsub的修饰语形式!直接地,除非那个字符串在您的代码中没有显示给我们的其他地方别名。最后,如果在gsub调用中使用单引号字符串文字,则不需要转义反斜杠。

#2


2  

Be very careful with your \b boundaries. Here’s why.

小心你的底线。这是为什么。

#3


0  

The #{word} syntax doesn't work for regular expressions. Use Regexp.new instead:

#{word}语法不适用于正则表达式。使用正则表达式。新:

word = "below"
text = "I said, 'look out below'"

reg = Regexp.new("\\b#{word}\\b", true)
text = text.gsub(reg, "<b>\\0</b>")

Note that when using sting you need to escape \b to \\b, or it is interpreted as a backspace. If word may contain special regex characters, escape it using Regexp.escape.

注意,当使用sting时,你需要从b到\\b,或者它被解释为一个退格。如果单词可能包含特殊的regex字符,则使用Regexp.escape来避免它。

Also, by replacing the string to <b>#{word}</b> you may change casing of the string: "BeloW" will be replaced to "below". \0 corrects this by replacing with the found word. In addition, I added \\b at the beginning, you don't want to look for "day" and end up with "sunday".

另外,通过将字符串替换为#{word},您可以更改字符串的大小写:"BeloW"将被替换为"BeloW"。\0通过替换已找到的单词来纠正这个错误。此外,我在开始的时候添加了\b,你不想找“一天”而以“星期日”结束。