I need to search for specific pattern and only if its whole word or combination of few words I should replace it. I am struggling with metacharacters Say my search pattern is: "corp." Should be replaced with "Corporation" so when input: "SS Corp. Ltd" expected output is "SS Corporation Ltd"
我需要寻找特定的模式,只有当它的整个词或几个词的组合,我应该取代它。我在和元字符打交道,我的搜索模式是:“公司”。输入:“SS Corp. Ltd”时,期望输出为“SS Corporation Ltd”
I tried using:
我试着使用:
package main
import (
"fmt"
"regexp"
)
func main() {
search :="corp."
rep := "Corporation"
sample :="SS Corp. LTd"
var re = regexp.MustCompile(`(^|[^_])\b`+search+`\b([^_]|$)`)
s2 := re.ReplaceAllString(sample, "${1}"+rep+"${2}")
}
1 个解决方案
#1
2
There are several problems here:
这里有几个问题:
- An unescaped
.
matches any char other than line break, it must be escaped. Since you are building the pattern dynamically, useregexp.QuoteMeta
- 一个保有的。匹配除换行之外的任何字符,它必须被转义。因为您正在动态构建模式,所以使用regexp.QuoteMeta
- As a
\b
word boundary after.
requires a word char, you can't expecta\.\b
to matcha. b
. Replace the boundaries with(^|[^\p{L}0-9_])
for the leading boundary and([^\p{L}0-9_]|$)
for the trailing boundary. - 作为一个\b字的边界后。需要一个单词char,你不能期待一个\。\ b匹配a b。代替的边界(^ |[^ \ p { L } 0-9_])的主要边界和([^ \ p { L } 0-9_]| $)的边界。
- At this stage, the pattern will be built like this:
`(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`
, but since both the boundaries are consuming patterns, you will never match consecutive matches (corp. corp.
will result inCorporation corp.
, the second one won't be replaced). You should repeat replacing until no regex match can be found. - 在这个阶段,该模式将建造这样的:“(我)吗?(^ |[^ \ p { L } 0-9_])”+ regexp.QuoteMeta(搜索)+”([^ \ p { L } 0-9_]| $)”,但由于边界都消费模式,你将永远不会匹配连续(集团公司将导致公司corp .),第二个不会被取代)。您应该重复替换,直到找不到regex匹配。
- And to make the pattern case insensitive, use
(?i)
inline modifier at the pattern start. - 要使模式大小写不敏感,请在模式开始时使用(?i)内联修饰符。
The regex will look like
regex将看起来像
(?i)(^|[^\p{L}0-9_])corp\.([^\p{L}0-9_]|$)
See the regex demo.
查看演示正则表达式。
Details
细节
-
(?i)
- case insensitive modifier - (?i) -不区分大小写的修饰语
-
(^|[^\p{L}0-9_])
- either start of string or a char other than a Unicode letter, ASCII digit and_
- (^ |[^ \ p { L } 0-9_])——要么除了Unicode字符串或一个字符开始的信,ASCII数字和_
-
corp\.
- acorp.
substring - 集团\。——公司子串
-
([^\p{L}0-9_]|$)
- either a char other than a Unicode letter, ASCII digit and_
or end of string - ([^ \ p { L } 0-9_]| $)——非Unicode字符字母,ASCII数字和_或字符串的结束
See this example demo:
看到这个例子演示:
package main
import (
"fmt"
"regexp"
)
func main() {
search :="corp."
rep := "Corporation"
sample :="SS Corp. Corp. LTd"
var re = regexp.MustCompile(`(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`)
fmt.Println(re)
var res = sample
for re.MatchString(res) {
res = ReplaceWith(res, re, "${1}"+rep+"${2}")
}
fmt.Println(res)
}
func ReplaceWith(s string, re *regexp.Regexp, repl string) string {
return re.ReplaceAllString(s, repl)
}
Result: SS Corporation Corporation LTd
.
结果:SS Corporation LTd。
#1
2
There are several problems here:
这里有几个问题:
- An unescaped
.
matches any char other than line break, it must be escaped. Since you are building the pattern dynamically, useregexp.QuoteMeta
- 一个保有的。匹配除换行之外的任何字符,它必须被转义。因为您正在动态构建模式,所以使用regexp.QuoteMeta
- As a
\b
word boundary after.
requires a word char, you can't expecta\.\b
to matcha. b
. Replace the boundaries with(^|[^\p{L}0-9_])
for the leading boundary and([^\p{L}0-9_]|$)
for the trailing boundary. - 作为一个\b字的边界后。需要一个单词char,你不能期待一个\。\ b匹配a b。代替的边界(^ |[^ \ p { L } 0-9_])的主要边界和([^ \ p { L } 0-9_]| $)的边界。
- At this stage, the pattern will be built like this:
`(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`
, but since both the boundaries are consuming patterns, you will never match consecutive matches (corp. corp.
will result inCorporation corp.
, the second one won't be replaced). You should repeat replacing until no regex match can be found. - 在这个阶段,该模式将建造这样的:“(我)吗?(^ |[^ \ p { L } 0-9_])”+ regexp.QuoteMeta(搜索)+”([^ \ p { L } 0-9_]| $)”,但由于边界都消费模式,你将永远不会匹配连续(集团公司将导致公司corp .),第二个不会被取代)。您应该重复替换,直到找不到regex匹配。
- And to make the pattern case insensitive, use
(?i)
inline modifier at the pattern start. - 要使模式大小写不敏感,请在模式开始时使用(?i)内联修饰符。
The regex will look like
regex将看起来像
(?i)(^|[^\p{L}0-9_])corp\.([^\p{L}0-9_]|$)
See the regex demo.
查看演示正则表达式。
Details
细节
-
(?i)
- case insensitive modifier - (?i) -不区分大小写的修饰语
-
(^|[^\p{L}0-9_])
- either start of string or a char other than a Unicode letter, ASCII digit and_
- (^ |[^ \ p { L } 0-9_])——要么除了Unicode字符串或一个字符开始的信,ASCII数字和_
-
corp\.
- acorp.
substring - 集团\。——公司子串
-
([^\p{L}0-9_]|$)
- either a char other than a Unicode letter, ASCII digit and_
or end of string - ([^ \ p { L } 0-9_]| $)——非Unicode字符字母,ASCII数字和_或字符串的结束
See this example demo:
看到这个例子演示:
package main
import (
"fmt"
"regexp"
)
func main() {
search :="corp."
rep := "Corporation"
sample :="SS Corp. Corp. LTd"
var re = regexp.MustCompile(`(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`)
fmt.Println(re)
var res = sample
for re.MatchString(res) {
res = ReplaceWith(res, re, "${1}"+rep+"${2}")
}
fmt.Println(res)
}
func ReplaceWith(s string, re *regexp.Regexp, repl string) string {
return re.ReplaceAllString(s, repl)
}
Result: SS Corporation Corporation LTd
.
结果:SS Corporation LTd。