搜索大小写不敏感,并替换整个单词。

时间:2021-07-24 16:50:57

I need to search for specific pattern and only if its whole word or combination of few words I should replace it. I am struggling with metacharacters Say my search pattern is: "corp." Should be replaced with "Corporation" so when input: "SS Corp. Ltd" expected output is "SS Corporation Ltd"

我需要寻找特定的模式,只有当它的整个词或几个词的组合,我应该取代它。我在和元字符打交道,我的搜索模式是:“公司”。输入:“SS Corp. Ltd”时,期望输出为“SS Corporation Ltd”

I tried using:

我试着使用:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    search :="corp."
    rep := "Corporation"
    sample :="SS Corp. LTd"
    var re = regexp.MustCompile(`(^|[^_])\b`+search+`\b([^_]|$)`)
    s2 := re.ReplaceAllString(sample, "${1}"+rep+"${2}")
}

1 个解决方案

#1


2  

There are several problems here:

这里有几个问题:

  1. An unescaped . matches any char other than line break, it must be escaped. Since you are building the pattern dynamically, use regexp.QuoteMeta
  2. 一个保有的。匹配除换行之外的任何字符,它必须被转义。因为您正在动态构建模式,所以使用regexp.QuoteMeta
  3. As a \b word boundary after . requires a word char, you can't expect a\.\b to match a. b. Replace the boundaries with (^|[^\p{L}0-9_]) for the leading boundary and ([^\p{L}0-9_]|$) for the trailing boundary.
  4. 作为一个\b字的边界后。需要一个单词char,你不能期待一个\。\ b匹配a b。代替的边界(^ |[^ \ p { L } 0-9_])的主要边界和([^ \ p { L } 0-9_]| $)的边界。
  5. At this stage, the pattern will be built like this: `(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`, but since both the boundaries are consuming patterns, you will never match consecutive matches (corp. corp. will result in Corporation corp., the second one won't be replaced). You should repeat replacing until no regex match can be found.
  6. 在这个阶段,该模式将建造这样的:“(我)吗?(^ |[^ \ p { L } 0-9_])”+ regexp.QuoteMeta(搜索)+”([^ \ p { L } 0-9_]| $)”,但由于边界都消费模式,你将永远不会匹配连续(集团公司将导致公司corp .),第二个不会被取代)。您应该重复替换,直到找不到regex匹配。
  7. And to make the pattern case insensitive, use (?i) inline modifier at the pattern start.
  8. 要使模式大小写不敏感,请在模式开始时使用(?i)内联修饰符。

The regex will look like

regex将看起来像

(?i)(^|[^\p{L}0-9_])corp\.([^\p{L}0-9_]|$)

See the regex demo.

查看演示正则表达式。

Details

细节

  • (?i) - case insensitive modifier
  • (?i) -不区分大小写的修饰语
  • (^|[^\p{L}0-9_]) - either start of string or a char other than a Unicode letter, ASCII digit and _
  • (^ |[^ \ p { L } 0-9_])——要么除了Unicode字符串或一个字符开始的信,ASCII数字和_
  • corp\. - a corp. substring
  • 集团\。——公司子串
  • ([^\p{L}0-9_]|$) - either a char other than a Unicode letter, ASCII digit and _ or end of string
  • ([^ \ p { L } 0-9_]| $)——非Unicode字符字母,ASCII数字和_或字符串的结束

See this example demo:

看到这个例子演示:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    search :="corp."
    rep := "Corporation"
    sample :="SS Corp. Corp. LTd"
    var re = regexp.MustCompile(`(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`)
    fmt.Println(re)
    var res = sample
    for re.MatchString(res) {
        res = ReplaceWith(res, re, "${1}"+rep+"${2}")
    }
    fmt.Println(res)
}

func ReplaceWith(s string, re *regexp.Regexp, repl string) string {
    return re.ReplaceAllString(s, repl)
}

Result: SS Corporation Corporation LTd.

结果:SS Corporation LTd。

#1


2  

There are several problems here:

这里有几个问题:

  1. An unescaped . matches any char other than line break, it must be escaped. Since you are building the pattern dynamically, use regexp.QuoteMeta
  2. 一个保有的。匹配除换行之外的任何字符,它必须被转义。因为您正在动态构建模式,所以使用regexp.QuoteMeta
  3. As a \b word boundary after . requires a word char, you can't expect a\.\b to match a. b. Replace the boundaries with (^|[^\p{L}0-9_]) for the leading boundary and ([^\p{L}0-9_]|$) for the trailing boundary.
  4. 作为一个\b字的边界后。需要一个单词char,你不能期待一个\。\ b匹配a b。代替的边界(^ |[^ \ p { L } 0-9_])的主要边界和([^ \ p { L } 0-9_]| $)的边界。
  5. At this stage, the pattern will be built like this: `(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`, but since both the boundaries are consuming patterns, you will never match consecutive matches (corp. corp. will result in Corporation corp., the second one won't be replaced). You should repeat replacing until no regex match can be found.
  6. 在这个阶段,该模式将建造这样的:“(我)吗?(^ |[^ \ p { L } 0-9_])”+ regexp.QuoteMeta(搜索)+”([^ \ p { L } 0-9_]| $)”,但由于边界都消费模式,你将永远不会匹配连续(集团公司将导致公司corp .),第二个不会被取代)。您应该重复替换,直到找不到regex匹配。
  7. And to make the pattern case insensitive, use (?i) inline modifier at the pattern start.
  8. 要使模式大小写不敏感,请在模式开始时使用(?i)内联修饰符。

The regex will look like

regex将看起来像

(?i)(^|[^\p{L}0-9_])corp\.([^\p{L}0-9_]|$)

See the regex demo.

查看演示正则表达式。

Details

细节

  • (?i) - case insensitive modifier
  • (?i) -不区分大小写的修饰语
  • (^|[^\p{L}0-9_]) - either start of string or a char other than a Unicode letter, ASCII digit and _
  • (^ |[^ \ p { L } 0-9_])——要么除了Unicode字符串或一个字符开始的信,ASCII数字和_
  • corp\. - a corp. substring
  • 集团\。——公司子串
  • ([^\p{L}0-9_]|$) - either a char other than a Unicode letter, ASCII digit and _ or end of string
  • ([^ \ p { L } 0-9_]| $)——非Unicode字符字母,ASCII数字和_或字符串的结束

See this example demo:

看到这个例子演示:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    search :="corp."
    rep := "Corporation"
    sample :="SS Corp. Corp. LTd"
    var re = regexp.MustCompile(`(?i)(^|[^\p{L}0-9_])`+regexp.QuoteMeta(search)+`([^\p{L}0-9_]|$)`)
    fmt.Println(re)
    var res = sample
    for re.MatchString(res) {
        res = ReplaceWith(res, re, "${1}"+rep+"${2}")
    }
    fmt.Println(res)
}

func ReplaceWith(s string, re *regexp.Regexp, repl string) string {
    return re.ReplaceAllString(s, repl)
}

Result: SS Corporation Corporation LTd.

结果:SS Corporation LTd。