如何找到并替换一个特定的字符,但仅当它在引号中?

时间:2022-12-27 19:21:21

Problem: I have thousands of documents which contains a specific character I don't want. E.g. the character a. These documents contain a variety of characters, but the a's I want to replace are inside double quotes or single quotes.

问题:我有数以千计的文档,其中包含我不想要的特定字符。这些文件包含各种字符,但是我想要替换的是双引号或单引号。

I would like to find and replace them, and I thought using Regex would be needed. I am using VSCode, but I'm open to any suggestions.

我想找到并替换它们,我认为需要使用Regex。我正在使用VSCode,但是我愿意接受任何建议。

My attempt: I was able to find the following regex to match for a specific string containing the values inside the ().

我的尝试:我能够找到下面的regex来匹配包含()内的值的特定字符串。

".*?(r).*?"

However, this only highlights the entire quote. I want to highlight the character only.

然而,这只突出了整个引用。我只想突出这个人物。

Any solution, perhaps outside of regex, is welcome.

任何解决方案(可能在regex之外)都是受欢迎的。

Example outcomes: Given, the character is a, find replace to b

示例结果:给定,字符为a,查找替换为b

Somebody once told me "apples" are good for you => Somebody once told me "bpples" are good for you

有人告诉我“苹果”对你有好处有人告诉我“乳头”对你有好处

"Aardvarks" make good kebabs => "Abrdvbrks" make good kebabs

"Aardvarks"做了好吃的烤肉串=> "Abrdvbrks"做了好吃的烤肉串

The boy said "aaah!" when his mom told him he was eating aardvark => The boy said "bbbh!" when his mom told him he was eating aardvark

当他妈妈告诉他他在吃土豚时,男孩说“啊!

6 个解决方案

#1


8  

Visual Studio Code

VS Code uses JavaScript RegEx engine for its find / replace functionality. This means you are very limited in working with regex in comparison to other flavors like .NET or PCRE.

VS代码使用JavaScript RegEx引擎进行查找/替换功能。这意味着与. net或PCRE等其他版本相比,您在使用regex方面非常有限。

Lucky enough that this flavor supports lookaheads and with lookaheads you are able to look for but not consume character. So one way to ensure that we are within a quoted string is to look for number of quotes down to bottom of file / subject string to be odd after matching an a:

幸运的是,这种风味支持了lookahead,而使用lookahead你可以寻找但不消耗角色。因此,确保我们在引号内的一种方法是,在匹配a后,从文件/主题字符串的底部查找引号的数量为奇数:

a(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

Live demo

现场演示

This looks for as in a double quoted string, to have it for single quoted strings substitute all "s with '. You can't have both at a time.

这就像在双引号字符串中一样,让它作为单引号字符串替换所有的“s”。你不可能同时拥有两者。

There is a problem with regex above however, that it conflicts with escaped double quotes within double quoted strings. To match them too if it matters you have a long way to go:

然而,regex有一个问题,它与双引号字符串中的转义双引号发生冲突。如果重要的话,你还有很长的路要走:

a(?=[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*)*$)

Applying these approaches on large files probably will result in an stack overflow so let's see a better approach.

在大型文件上应用这些方法可能会导致堆栈溢出,所以让我们来看一种更好的方法。

I am using VSCode, but I'm open to any suggestions.

我正在使用VSCode,但是我愿意接受任何建议。

That's great. Then I'd suggest to use awk or sed or something more programmatic in order to achieve what you are after or if you are able to use Sublime Text a chance exists to work around this problem in a more elegant way.

太好了。然后我建议使用awk或sed或其他更程序化的东西来实现你所追求的,或者如果你能够使用崇高的文本,就有机会以更优雅的方式解决这个问题。

Sublime Text

This is supposed to work on large files with hundred of thousands of lines but care that it works for a single character (here a) that with some modifications may work for a word or substring too:

这应该适用于有成千上万行代码的大型文件,但要注意,它适用于单个字符(这里是a),只要稍加修改,也可以适用于单词或子字符串:

Search for:

搜索:

(?:"|\G(?<!")(?!\A))(?<r>[^a"\\]*+(?>\\.[^a"\\]*)*+)\K(a|"(*SKIP)(*F))(?(?=((?&r)"))\3)
                           ^              ^            ^

Replace it with: WHATEVER\3

换成:无论\ 3

Live demo

现场演示

RegEx Breakdown:

正则表达式分解:

(?: # Beginning of non-capturing group #1
    "   # Match a `"`
    |   # Or
    \G(?<!")(?!\A)  # Continue matching from last successful match
                    # It shouldn't start right after a `"`
)   # End of NCG #1
(?<r>   # Start of capturing group `r`
    [^a"\\]*+   # Match anything except `a`, `"` or a backslash (possessively)
    (?>\\.[^a"\\]*)*+   # Match an escaped character or 
                        # repeat last pattern as much as possible
)\K     # End of CG `r`, reset all consumed characters
(   # Start of CG #2 
    a   # Match literal `a`
    |   # Or
    "(*SKIP)(*F)    # Match a `"` and skip over current match
)
(?(?=   # Start a conditional cluster, assuming a positive lookahead
    ((?&r)")    # Start of CG #3, recurs CG `r` and match `"`
  )     # End of condition
  \3    # If conditional passed match CG #3
 )  # End of conditional

如何找到并替换一个特定的字符,但仅当它在引号中?

Three-step approach

Last but not least...

最后但并非最不重要…

Matching a character inside quotation marks is tricky since delimiters are exactly the same so opening and closing marks can not be distinguished from each other without taking a look at adjacent strings. What you can do is change a delimiter to something else so that you can look for it later.

在引号内匹配一个字符是很困难的,因为分隔符是完全相同的,所以如果不查看相邻的字符串,就不能区分开和结束标记。您可以做的是将分隔符更改为其他内容,以便以后可以查找它。

Step 1:

Search for: "[^"\\]*(?:\\.[^"\\]*)*"

搜索:“[^ " \ \]*(?:\ \[^。”\ \]*)*”

Replace with: $0Я

替换为:$ 0Я

Step 2:

Search for: a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

搜索:(? =[^ " \ \]*(?:\ \[^。”\ \]*)*”Я)

Replace with whatever you expect.

用你所期望的代替。

Step 3:

Search for:

搜索:“Я

Replace with nothing to revert every thing.

用没有替换的东西来还原所有的东西。


#2


2  

Firstly a few of considerations:

首先考虑几点:

  1. There could be multiple a characters within a single quote.
  2. 在一个引用中可能有多个a字符。
  3. Each quote (using single or double quotation marks) consists of an opening quote character, some text and the same closing quote character. A simple approach is to assume that when the quote characters are counted sequentially, the odd ones are opening quotes and the even ones are closing quotes.
  4. 每个引语(使用单引号或双引号)由一个开始引语字符、一些文本和相同的结束引语字符组成。一个简单的方法是假设当引用字符按顺序计数时,奇数是开引号,偶数是闭引号。
  5. Following point 2, it could be worth some further thought on whether single-quoted strings should be allowed. See the following example: It's a shame 'this quoted text' isn't quoted. Here, the simple approach would think there were two quoted strings: s a shame and isn. Another: This isn't a quote ...'this is' and 'it's unclear where this quote ends'. I've avoided attempting to tackle these complexities and gone with the simple approach below.
  6. 接下来的第2点,对于是否应该允许单引号字符串进行进一步的思考是值得的。请看下面的例子:很遗憾“这段引用的文字”没有被引用。在这里,简单的方法会认为有两个被引用的字符串:it ' s a shame and isn ' t。另一句:这不是引用……“这是”和“不清楚这句话的结尾是什么”。我避免尝试处理这些复杂的问题,而是采用下面的简单方法。

The bad news is that point 1 presents a bit of a problem, as a capturing group with a wildcard repeat character after it (e.g. (.*)*) will only capture the last captured "thing". But the good news is there's a way of getting around this within certain limits. Many regex engines will allow up to 99 capturing groups (*). So if we can make the assumption that there will be no more than 99 as in each quote (UPDATE ...or even if we can't - see step 3), we can do the following...

坏消息是,第1点带来了一点问题,因为在它之后(例如*)*具有通配符重复字符的捕获组只捕获最后捕获的“东西”。但好消息是有一种方法可以在一定范围内解决这个问题。许多regex引擎将允许多达99个捕获组(*)。因此,如果我们假设每个报价都不超过99(更新……)或者,即使我们不能——见步骤3),我们也可以做以下事情……

(*) Unfortunately my first port of call, Notepad++ doesn't - it only allows up to 9. Not sure about VS Code. But regex101 (used for the online demos below) does.

(*)不幸的是,我的第一个调用端口Notepad++没有——它最多只能支持9个。不确定VS代码。但是regex101(用于下面的在线演示)可以做到这一点。

TL;DR - What to do?

  1. Search for: "([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*"
  2. 搜索:[^“]”(*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*”
  3. Replace with: "\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31\32\33\34\35\36\37\38\39\40\41\42\43\44\45\46\47\48\49\50\51\52\53\54\55\56\57\58\59\60\61\62\63\64\65\66\67\68\69\70\71\72\73\74\75\76\77\78\79\80\81\82\83\84\85\86\87\88\89\90\91\92\93\94\95\96\97\98\99"
  4. 替换为:“1 \ \ 2 \ 3 \ 4 \ 5 \ \ 6 7 8 \ \ 9 10 \ \ 11 \ 12 13 \ 14、15、16 \ 17 \ \ 18 19 \ \ 20 \ 21 \ 22 24 \ 25 \ \ 23 \ 26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48 \ 49 \ 50 51 \ \ 52 55 \ 53 \ 54 \ \ 56 \ 57 \ 58 59 \ \ 60 \ 61 \ 62 \ 63 \ 64 \ 65 \ 66 \ 67 \ 68 \ 69 \ 70 \ 71 \ 72 \ 73 \ 74 \ 75 \ 76 \ 77 \ 78 \ 79 \ 80 \ 81 \ 82 \ 83 \ 84 \ 85 \ 86 \ 87 \ 88 \ 89 \ 90 \ 91 \ 92 \ 93 \ 94 \ 95 \ 96 \ 97 \ 98 \ 99”
  5. (Optionally keep repeating steps the previous two steps if there's a possibility of > 99 such characters in a single quote until they've all been replaced).
  6. (如果有可能在一个引用中出现>99这样的字符,可以继续重复前两个步骤,直到它们全部被替换)。
  7. Repeat step 1 but replacing all " with ' in the regular expression, i.e: '([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*'
  8. 重复第1步,但在正则表达式i中将“all”替换为“' i”。艾凡:”([^ ']*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*”
  9. Repeat steps 2-3.
  10. 重复步骤2 - 3。

Online demos

Please see the following regex101 demos, which could actually be used to perform the replacements if you're able to copy the whole text into the contents of "TEST STRING":

请参见下面的regex101演示,如果您能够将整个文本复制到“测试字符串”的内容中,那么它实际上可以用于执行替换:

#3


1  

/(["'])(.*?)(a)(.*?\1)/g

With the replace pattern:

替换的模式:

$1$2$4

As far as I'm aware, VS Code uses the same regex engine as JavaScript, which is why I've written my example in JS.

就我所知,VS代码使用与JavaScript相同的regex引擎,这就是我用JS编写示例的原因。

The problem with this is that if you have multiple a's in 1 set of quotes, then it will struggle to pull out the right values, so there needs to be some sort of code behind it, or you, hammering the replace button until no more matches are found, to recurse the pattern and get rid of all the a's in between quotes

这里的问题是,如果你有多个在1的引用,那么它将很难拿出正确的值,所以需要一些代码,或者你,锤击取代按钮,直到发现匹配,递归模式,摆脱所有的引号之间的

let regex = /(["'])(.*?)(a)(.*?\1)/g,
subst = `$1$2$4`,
str = `"a"
"helapke"
Not matched - aaaaaaa
"This is the way the world ends"
"Not with fire"
"ABBA"
"abba",
'I can haz cheezburger'
"This is not a match'
`;


// Loop to get rid of multiple a's in quotes
while(str.match(regex)){
    str = str.replace(regex, subst);
}

const result = str;
console.log(result);

#4


1  

If you can use Visual Studio (instead of Visual Studio Code), it is written in C++ and C# and uses the .NET Framework regular expressions, which means you can use variable length lookbehinds to accomplish this.

如果您可以使用Visual Studio(而不是Visual Studio代码),它是用c++和c#编写的,并且使用。net框架正则表达式,这意味着您可以使用可变长度的lookbehind来实现这一点。

(?<="[^"\n]*)a(?=[^"\n]*")

Adding some more logic to the above regular expression, we can tell it to ignore any locations where there are an even amount of " preceding it. This prevents matches for a outside of quotes. Take, for example, the string "a" a "a". Only the first and last a in this string will be matched, but the one in the middle will be ignored.

在上面的正则表达式中添加更多的逻辑,我们可以告诉它忽略任何有偶数“在它之前”的位置。这将防止对引号外部的匹配。例如,字符串“a”a“a”a。这个字符串中只有第一个和最后一个a将被匹配,但是中间的那个将被忽略。

(?<!^[^"\n]*(?:(?:"[^"\n]*){2})+)(?<="[^"\n]*)a(?=[^"\n]*")

Now the only problem is this will break if we have escaped " within two double quotes such as "a\"" a "a". We need to add more logic to prevent this behaviour. Luckily, this beautiful answer exists for properly matching escaped ". Adding this logic to the regex above, we get the following:

现在唯一的问题是,如果我们在两个双引号(如“a\”“a”)中转义,这个值就会失效。我们需要增加更多的逻辑来防止这种行为。幸运的是,这个美丽的答案存在于正确匹配逃逸。将此逻辑添加到上面的regex中,我们得到以下内容:

(?<!^[^"\n]*(?:(?:"(?:[^"\\\n]|\\.)*){2})+)(?<="[^"\n]*)a(?=[^"\n]*")

I'm not sure which method works best with your strings, but I'll explain this last regex in detail as it also explains the two previous ones.

我不确定哪种方法最适合您的字符串,但是我将详细解释最后的regex,因为它也解释了前面的两个。

  • (?<!^[^"\n]*(?:(?:"(?:[^"\\\n]|\\.)*){2})+) Negative lookbehind ensuring what precedes doesn't match the following
    • ^ Assert position at the start of the line
    • ^断言位置的线
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • (?:(?:"(?:[^"\\\n]|\\.)*){2})+ Match the following one or more times. This ensures if there are any " preceding the match that they are balanced in the sense that there is an opening and closing double quote.
      • (?:"(?:[^"\\\n]|\\.)*){2} Match the following exactly twice
      • (?:“(?):[^ " \ \ \ n]| \ \)*){ 2 }匹配下面的两倍
      • " Match this literally
      • “匹配这个字面意思
      • (?:[^"\\\n]|\\.)* Match either of the following any number of times
        • [^"\\\n] Match anything except ", \ and \n
        • [^ " \ \ \ n]匹配除了",\ \ n
        • \\. Matches \ followed by any character
        • \ \。匹配\后面跟着任何字符
      • (?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
    • (?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (? < ! ^ ^“\ n *(?(?::“(?:[^ " \ \ \ n]| \ \)*){ 2 })+)-向后插入确保之前不匹配以下^断言位置的线(^“\ n]*匹配除了”或任何次数\ n(?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (?<="[^"\n]*) Positive lookbehind ensuring what precedes matches the following
    • " Match this literally
    • “匹配这个字面意思
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
  • (? < = "[^ " \ n]*)积极向后插入确保之前匹配以下“匹配这个字面上[^“\ n]*匹配除了”或\ n的次数
  • a Match this literally
  • 匹配这字面上的
  • (?=[^"\n]*") Positive lookahead ensuring what follows matches the following
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • " Match this literally
    • “匹配这个字面意思
  • (? =[^“\ n]*”)积极超前确保接下来的比赛后[^“\ n]*匹配除了”或\ n匹配这个任意次数”的意思

You can drop the \n from the above pattern as the following suggests. I added it just in case there's some sort of special cases I'm not considering (i.e. comments) that could break this regex within your text. The \A also forces the regex to match from the start of the string (or file) instead of the start of the line.

您可以从上面的模式中删除\n,如下所示。我添加了它以防有一些特殊情况我不考虑(比如评论)可以在你的文本中打破这个正则表达式。\A还强制regex从字符串(或文件)的开始而不是从行开始匹配。

(?<!\A[^"]*(?:(?:"(?:[^"\\]|\\.)*){2})+)(?<="[^"]*)a(?=[^"]*")

You can test this regex here

您可以在这里测试这个regex

This is what it looks like in Visual Studio:

这就是在Visual Studio中的样子:

如何找到并替换一个特定的字符,但仅当它在引号中?

#5


0  

I am using VSCode, but I'm open to any suggestions.

我正在使用VSCode,但是我愿意接受任何建议。

If you want to stay in an Editor environment, you could use
Visual Studio (>= 2012) or even notepad++ for quick fixup.
This avoids having to use a spurious script environment.

如果您想要停留在编辑器环境中,可以使用Visual Studio(>= 2012),甚至可以使用notepad++快速修复。这就避免了使用伪脚本环境。

Both of these engines (Dot-Net and boost, respectively) use the \G construct.
Which is start the next match at the position where the last one left off.

这两种引擎(分别使用网络和boost)都使用\G构造。也就是在最后一场比赛结束的位置开始下一场比赛。

Again, this is just a suggestion.

这只是一个建议。

This regex doesn't check the validity of balanced quotes within the entire
string ahead of time (but it could with the addition of a single line).

这个regex不会提前检查整个字符串中平衡引号的有效性(但是可以添加一行)。

It is all about knowing where the inside and outside of quotes are.

这都是关于知道引号内和外部的位置。

I've commented the regex, but if you need more info let me know.
Again this is just a suggestion (I know your editor uses ECMAScript).

我已经评论了regex,但是如果您需要更多的信息,请告诉我。这只是一个建议(我知道您的编辑使用ECMAScript)。

Find (?s)(?:^([^"]*(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)|(?!^)\G)a([^"a]*(?:(?=a.*?")|(?:"[^"]*$|"[^"]*(?=")(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)))
Replace $1b$2

找到(?)(?:^ ^”)*(?:“^”)*(? =”)[^]*(? =))*(^ ")*)|(? ! ^)\ G)”([^]*(吗?(? = . * ?”)|(?:“[^]* $ | "[^ "]*(? = ")(?:“^”)*(? =”)[^]*(? =))*[^]*)))取代1 b 2美元

That's all there is to it.

这就是一切。

https://regex101.com/r/loLFYH/1

https://regex101.com/r/loLFYH/1

Comments

评论

(?s)                          # Dot-all inine modifier
 (?:
      ^                             # BOS 
      (                             # (1 start), Find first quote from BOS (written back)
           [^"]* 
           (?:                           # --- Cluster
                " [^"a]*                      # Inside quotes with no 'a'
                (?= " )
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )
           )*                            # --- End cluster, 0 to many times

           " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                         # to be sucked up by this match           
      )                             # (1 end)

   |                              # OR,

      (?! ^ )                       # Not-BOS 
      \G                            # Continue where left off from last match.
                                    # Must be an 'a' at this point
 )
 a                             # The 'a' to be replaced

 (                             # (2 start), Up to the next 'a' (to be written back)
      [^"a]* 
      (?:                           # --------------------
           (?= a .*? " )                 # If stopped before 'a', must be a quote ahead
        |                              # or,
           (?:                           # --------------------
                " [^"]* $                     # If stopped at a quote, check for EOS
             |                              # or, 
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )

                (?:                           # --- Cluster
                     " [^"a]*                      # Inside quotes with no 'a'
                     (?= " )
                     " [^"]*                       # Between quotes 
                     (?= " )
                )*                            # --- End cluster, 0 to many times

                " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                              # to be sucked up on the next match                    
           )                             # --------------------
      )                             # --------------------
 )                             # (2 end)

#6


0  

"Inside double quotes" is rather tricky, because there are may complicating scenarios to consider to fully automate this.

“内部双引号”是相当棘手的,因为要完全自动化,可能需要考虑复杂的场景。

What are your precise rules for "enclosed by quotes"? Do you need to consider multi-line quotes? Do you have quoted strings containing escaped quotes or quotes used other than starting/ending string quotation?

您对“引号括起来”的确切规则是什么?你需要考虑多行引号吗?除了开始/结束字符串引用之外,您是否有包含转义引号或引号的引号?

However there may be a fairly simple expression to do much of what you want.

然而,可能有一个相当简单的表达式来完成您想要的大部分工作。

Search expression: ("[^a"]*)a

搜索表达式:(“[^]*)

Replacement expression: $1b

替换表达式:$ 1 b

This doesn't consider inside or outside of quotes - you have do that visually. But it highlights text from the quote to the matching character, so you can quickly decide if this is inside or not.

这并不考虑引号的内部或外部——您可以在视觉上这样做。但是它突出显示了从引用到匹配字符的文本,因此您可以快速地确定它是否在内部。

If you can live with the visual inspection, then we can build up this pattern to include different quote types and upper and lower case.

如果您能够接受可视化检查,那么我们可以构建这个模式,包括不同的报价类型和大小写。

#1


8  

Visual Studio Code

VS Code uses JavaScript RegEx engine for its find / replace functionality. This means you are very limited in working with regex in comparison to other flavors like .NET or PCRE.

VS代码使用JavaScript RegEx引擎进行查找/替换功能。这意味着与. net或PCRE等其他版本相比,您在使用regex方面非常有限。

Lucky enough that this flavor supports lookaheads and with lookaheads you are able to look for but not consume character. So one way to ensure that we are within a quoted string is to look for number of quotes down to bottom of file / subject string to be odd after matching an a:

幸运的是,这种风味支持了lookahead,而使用lookahead你可以寻找但不消耗角色。因此,确保我们在引号内的一种方法是,在匹配a后,从文件/主题字符串的底部查找引号的数量为奇数:

a(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

Live demo

现场演示

This looks for as in a double quoted string, to have it for single quoted strings substitute all "s with '. You can't have both at a time.

这就像在双引号字符串中一样,让它作为单引号字符串替换所有的“s”。你不可能同时拥有两者。

There is a problem with regex above however, that it conflicts with escaped double quotes within double quoted strings. To match them too if it matters you have a long way to go:

然而,regex有一个问题,它与双引号字符串中的转义双引号发生冲突。如果重要的话,你还有很长的路要走:

a(?=[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*)*$)

Applying these approaches on large files probably will result in an stack overflow so let's see a better approach.

在大型文件上应用这些方法可能会导致堆栈溢出,所以让我们来看一种更好的方法。

I am using VSCode, but I'm open to any suggestions.

我正在使用VSCode,但是我愿意接受任何建议。

That's great. Then I'd suggest to use awk or sed or something more programmatic in order to achieve what you are after or if you are able to use Sublime Text a chance exists to work around this problem in a more elegant way.

太好了。然后我建议使用awk或sed或其他更程序化的东西来实现你所追求的,或者如果你能够使用崇高的文本,就有机会以更优雅的方式解决这个问题。

Sublime Text

This is supposed to work on large files with hundred of thousands of lines but care that it works for a single character (here a) that with some modifications may work for a word or substring too:

这应该适用于有成千上万行代码的大型文件,但要注意,它适用于单个字符(这里是a),只要稍加修改,也可以适用于单词或子字符串:

Search for:

搜索:

(?:"|\G(?<!")(?!\A))(?<r>[^a"\\]*+(?>\\.[^a"\\]*)*+)\K(a|"(*SKIP)(*F))(?(?=((?&r)"))\3)
                           ^              ^            ^

Replace it with: WHATEVER\3

换成:无论\ 3

Live demo

现场演示

RegEx Breakdown:

正则表达式分解:

(?: # Beginning of non-capturing group #1
    "   # Match a `"`
    |   # Or
    \G(?<!")(?!\A)  # Continue matching from last successful match
                    # It shouldn't start right after a `"`
)   # End of NCG #1
(?<r>   # Start of capturing group `r`
    [^a"\\]*+   # Match anything except `a`, `"` or a backslash (possessively)
    (?>\\.[^a"\\]*)*+   # Match an escaped character or 
                        # repeat last pattern as much as possible
)\K     # End of CG `r`, reset all consumed characters
(   # Start of CG #2 
    a   # Match literal `a`
    |   # Or
    "(*SKIP)(*F)    # Match a `"` and skip over current match
)
(?(?=   # Start a conditional cluster, assuming a positive lookahead
    ((?&r)")    # Start of CG #3, recurs CG `r` and match `"`
  )     # End of condition
  \3    # If conditional passed match CG #3
 )  # End of conditional

如何找到并替换一个特定的字符,但仅当它在引号中?

Three-step approach

Last but not least...

最后但并非最不重要…

Matching a character inside quotation marks is tricky since delimiters are exactly the same so opening and closing marks can not be distinguished from each other without taking a look at adjacent strings. What you can do is change a delimiter to something else so that you can look for it later.

在引号内匹配一个字符是很困难的,因为分隔符是完全相同的,所以如果不查看相邻的字符串,就不能区分开和结束标记。您可以做的是将分隔符更改为其他内容,以便以后可以查找它。

Step 1:

Search for: "[^"\\]*(?:\\.[^"\\]*)*"

搜索:“[^ " \ \]*(?:\ \[^。”\ \]*)*”

Replace with: $0Я

替换为:$ 0Я

Step 2:

Search for: a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

搜索:(? =[^ " \ \]*(?:\ \[^。”\ \]*)*”Я)

Replace with whatever you expect.

用你所期望的代替。

Step 3:

Search for:

搜索:“Я

Replace with nothing to revert every thing.

用没有替换的东西来还原所有的东西。


#2


2  

Firstly a few of considerations:

首先考虑几点:

  1. There could be multiple a characters within a single quote.
  2. 在一个引用中可能有多个a字符。
  3. Each quote (using single or double quotation marks) consists of an opening quote character, some text and the same closing quote character. A simple approach is to assume that when the quote characters are counted sequentially, the odd ones are opening quotes and the even ones are closing quotes.
  4. 每个引语(使用单引号或双引号)由一个开始引语字符、一些文本和相同的结束引语字符组成。一个简单的方法是假设当引用字符按顺序计数时,奇数是开引号,偶数是闭引号。
  5. Following point 2, it could be worth some further thought on whether single-quoted strings should be allowed. See the following example: It's a shame 'this quoted text' isn't quoted. Here, the simple approach would think there were two quoted strings: s a shame and isn. Another: This isn't a quote ...'this is' and 'it's unclear where this quote ends'. I've avoided attempting to tackle these complexities and gone with the simple approach below.
  6. 接下来的第2点,对于是否应该允许单引号字符串进行进一步的思考是值得的。请看下面的例子:很遗憾“这段引用的文字”没有被引用。在这里,简单的方法会认为有两个被引用的字符串:it ' s a shame and isn ' t。另一句:这不是引用……“这是”和“不清楚这句话的结尾是什么”。我避免尝试处理这些复杂的问题,而是采用下面的简单方法。

The bad news is that point 1 presents a bit of a problem, as a capturing group with a wildcard repeat character after it (e.g. (.*)*) will only capture the last captured "thing". But the good news is there's a way of getting around this within certain limits. Many regex engines will allow up to 99 capturing groups (*). So if we can make the assumption that there will be no more than 99 as in each quote (UPDATE ...or even if we can't - see step 3), we can do the following...

坏消息是,第1点带来了一点问题,因为在它之后(例如*)*具有通配符重复字符的捕获组只捕获最后捕获的“东西”。但好消息是有一种方法可以在一定范围内解决这个问题。许多regex引擎将允许多达99个捕获组(*)。因此,如果我们假设每个报价都不超过99(更新……)或者,即使我们不能——见步骤3),我们也可以做以下事情……

(*) Unfortunately my first port of call, Notepad++ doesn't - it only allows up to 9. Not sure about VS Code. But regex101 (used for the online demos below) does.

(*)不幸的是,我的第一个调用端口Notepad++没有——它最多只能支持9个。不确定VS代码。但是regex101(用于下面的在线演示)可以做到这一点。

TL;DR - What to do?

  1. Search for: "([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*"
  2. 搜索:[^“]”(*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*”
  3. Replace with: "\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31\32\33\34\35\36\37\38\39\40\41\42\43\44\45\46\47\48\49\50\51\52\53\54\55\56\57\58\59\60\61\62\63\64\65\66\67\68\69\70\71\72\73\74\75\76\77\78\79\80\81\82\83\84\85\86\87\88\89\90\91\92\93\94\95\96\97\98\99"
  4. 替换为:“1 \ \ 2 \ 3 \ 4 \ 5 \ \ 6 7 8 \ \ 9 10 \ \ 11 \ 12 13 \ 14、15、16 \ 17 \ \ 18 19 \ \ 20 \ 21 \ 22 24 \ 25 \ \ 23 \ 26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48 \ 49 \ 50 51 \ \ 52 55 \ 53 \ 54 \ \ 56 \ 57 \ 58 59 \ \ 60 \ 61 \ 62 \ 63 \ 64 \ 65 \ 66 \ 67 \ 68 \ 69 \ 70 \ 71 \ 72 \ 73 \ 74 \ 75 \ 76 \ 77 \ 78 \ 79 \ 80 \ 81 \ 82 \ 83 \ 84 \ 85 \ 86 \ 87 \ 88 \ 89 \ 90 \ 91 \ 92 \ 93 \ 94 \ 95 \ 96 \ 97 \ 98 \ 99”
  5. (Optionally keep repeating steps the previous two steps if there's a possibility of > 99 such characters in a single quote until they've all been replaced).
  6. (如果有可能在一个引用中出现>99这样的字符,可以继续重复前两个步骤,直到它们全部被替换)。
  7. Repeat step 1 but replacing all " with ' in the regular expression, i.e: '([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*'
  8. 重复第1步,但在正则表达式i中将“all”替换为“' i”。艾凡:”([^ ']*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*”
  9. Repeat steps 2-3.
  10. 重复步骤2 - 3。

Online demos

Please see the following regex101 demos, which could actually be used to perform the replacements if you're able to copy the whole text into the contents of "TEST STRING":

请参见下面的regex101演示,如果您能够将整个文本复制到“测试字符串”的内容中,那么它实际上可以用于执行替换:

#3


1  

/(["'])(.*?)(a)(.*?\1)/g

With the replace pattern:

替换的模式:

$1$2$4

As far as I'm aware, VS Code uses the same regex engine as JavaScript, which is why I've written my example in JS.

就我所知,VS代码使用与JavaScript相同的regex引擎,这就是我用JS编写示例的原因。

The problem with this is that if you have multiple a's in 1 set of quotes, then it will struggle to pull out the right values, so there needs to be some sort of code behind it, or you, hammering the replace button until no more matches are found, to recurse the pattern and get rid of all the a's in between quotes

这里的问题是,如果你有多个在1的引用,那么它将很难拿出正确的值,所以需要一些代码,或者你,锤击取代按钮,直到发现匹配,递归模式,摆脱所有的引号之间的

let regex = /(["'])(.*?)(a)(.*?\1)/g,
subst = `$1$2$4`,
str = `"a"
"helapke"
Not matched - aaaaaaa
"This is the way the world ends"
"Not with fire"
"ABBA"
"abba",
'I can haz cheezburger'
"This is not a match'
`;


// Loop to get rid of multiple a's in quotes
while(str.match(regex)){
    str = str.replace(regex, subst);
}

const result = str;
console.log(result);

#4


1  

If you can use Visual Studio (instead of Visual Studio Code), it is written in C++ and C# and uses the .NET Framework regular expressions, which means you can use variable length lookbehinds to accomplish this.

如果您可以使用Visual Studio(而不是Visual Studio代码),它是用c++和c#编写的,并且使用。net框架正则表达式,这意味着您可以使用可变长度的lookbehind来实现这一点。

(?<="[^"\n]*)a(?=[^"\n]*")

Adding some more logic to the above regular expression, we can tell it to ignore any locations where there are an even amount of " preceding it. This prevents matches for a outside of quotes. Take, for example, the string "a" a "a". Only the first and last a in this string will be matched, but the one in the middle will be ignored.

在上面的正则表达式中添加更多的逻辑,我们可以告诉它忽略任何有偶数“在它之前”的位置。这将防止对引号外部的匹配。例如,字符串“a”a“a”a。这个字符串中只有第一个和最后一个a将被匹配,但是中间的那个将被忽略。

(?<!^[^"\n]*(?:(?:"[^"\n]*){2})+)(?<="[^"\n]*)a(?=[^"\n]*")

Now the only problem is this will break if we have escaped " within two double quotes such as "a\"" a "a". We need to add more logic to prevent this behaviour. Luckily, this beautiful answer exists for properly matching escaped ". Adding this logic to the regex above, we get the following:

现在唯一的问题是,如果我们在两个双引号(如“a\”“a”)中转义,这个值就会失效。我们需要增加更多的逻辑来防止这种行为。幸运的是,这个美丽的答案存在于正确匹配逃逸。将此逻辑添加到上面的regex中,我们得到以下内容:

(?<!^[^"\n]*(?:(?:"(?:[^"\\\n]|\\.)*){2})+)(?<="[^"\n]*)a(?=[^"\n]*")

I'm not sure which method works best with your strings, but I'll explain this last regex in detail as it also explains the two previous ones.

我不确定哪种方法最适合您的字符串,但是我将详细解释最后的regex,因为它也解释了前面的两个。

  • (?<!^[^"\n]*(?:(?:"(?:[^"\\\n]|\\.)*){2})+) Negative lookbehind ensuring what precedes doesn't match the following
    • ^ Assert position at the start of the line
    • ^断言位置的线
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • (?:(?:"(?:[^"\\\n]|\\.)*){2})+ Match the following one or more times. This ensures if there are any " preceding the match that they are balanced in the sense that there is an opening and closing double quote.
      • (?:"(?:[^"\\\n]|\\.)*){2} Match the following exactly twice
      • (?:“(?):[^ " \ \ \ n]| \ \)*){ 2 }匹配下面的两倍
      • " Match this literally
      • “匹配这个字面意思
      • (?:[^"\\\n]|\\.)* Match either of the following any number of times
        • [^"\\\n] Match anything except ", \ and \n
        • [^ " \ \ \ n]匹配除了",\ \ n
        • \\. Matches \ followed by any character
        • \ \。匹配\后面跟着任何字符
      • (?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
    • (?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (? < ! ^ ^“\ n *(?(?::“(?:[^ " \ \ \ n]| \ \)*){ 2 })+)-向后插入确保之前不匹配以下^断言位置的线(^“\ n]*匹配除了”或任何次数\ n(?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (?<="[^"\n]*) Positive lookbehind ensuring what precedes matches the following
    • " Match this literally
    • “匹配这个字面意思
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
  • (? < = "[^ " \ n]*)积极向后插入确保之前匹配以下“匹配这个字面上[^“\ n]*匹配除了”或\ n的次数
  • a Match this literally
  • 匹配这字面上的
  • (?=[^"\n]*") Positive lookahead ensuring what follows matches the following
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • " Match this literally
    • “匹配这个字面意思
  • (? =[^“\ n]*”)积极超前确保接下来的比赛后[^“\ n]*匹配除了”或\ n匹配这个任意次数”的意思

You can drop the \n from the above pattern as the following suggests. I added it just in case there's some sort of special cases I'm not considering (i.e. comments) that could break this regex within your text. The \A also forces the regex to match from the start of the string (or file) instead of the start of the line.

您可以从上面的模式中删除\n,如下所示。我添加了它以防有一些特殊情况我不考虑(比如评论)可以在你的文本中打破这个正则表达式。\A还强制regex从字符串(或文件)的开始而不是从行开始匹配。

(?<!\A[^"]*(?:(?:"(?:[^"\\]|\\.)*){2})+)(?<="[^"]*)a(?=[^"]*")

You can test this regex here

您可以在这里测试这个regex

This is what it looks like in Visual Studio:

这就是在Visual Studio中的样子:

如何找到并替换一个特定的字符,但仅当它在引号中?

#5


0  

I am using VSCode, but I'm open to any suggestions.

我正在使用VSCode,但是我愿意接受任何建议。

If you want to stay in an Editor environment, you could use
Visual Studio (>= 2012) or even notepad++ for quick fixup.
This avoids having to use a spurious script environment.

如果您想要停留在编辑器环境中,可以使用Visual Studio(>= 2012),甚至可以使用notepad++快速修复。这就避免了使用伪脚本环境。

Both of these engines (Dot-Net and boost, respectively) use the \G construct.
Which is start the next match at the position where the last one left off.

这两种引擎(分别使用网络和boost)都使用\G构造。也就是在最后一场比赛结束的位置开始下一场比赛。

Again, this is just a suggestion.

这只是一个建议。

This regex doesn't check the validity of balanced quotes within the entire
string ahead of time (but it could with the addition of a single line).

这个regex不会提前检查整个字符串中平衡引号的有效性(但是可以添加一行)。

It is all about knowing where the inside and outside of quotes are.

这都是关于知道引号内和外部的位置。

I've commented the regex, but if you need more info let me know.
Again this is just a suggestion (I know your editor uses ECMAScript).

我已经评论了regex,但是如果您需要更多的信息,请告诉我。这只是一个建议(我知道您的编辑使用ECMAScript)。

Find (?s)(?:^([^"]*(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)|(?!^)\G)a([^"a]*(?:(?=a.*?")|(?:"[^"]*$|"[^"]*(?=")(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)))
Replace $1b$2

找到(?)(?:^ ^”)*(?:“^”)*(? =”)[^]*(? =))*(^ ")*)|(? ! ^)\ G)”([^]*(吗?(? = . * ?”)|(?:“[^]* $ | "[^ "]*(? = ")(?:“^”)*(? =”)[^]*(? =))*[^]*)))取代1 b 2美元

That's all there is to it.

这就是一切。

https://regex101.com/r/loLFYH/1

https://regex101.com/r/loLFYH/1

Comments

评论

(?s)                          # Dot-all inine modifier
 (?:
      ^                             # BOS 
      (                             # (1 start), Find first quote from BOS (written back)
           [^"]* 
           (?:                           # --- Cluster
                " [^"a]*                      # Inside quotes with no 'a'
                (?= " )
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )
           )*                            # --- End cluster, 0 to many times

           " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                         # to be sucked up by this match           
      )                             # (1 end)

   |                              # OR,

      (?! ^ )                       # Not-BOS 
      \G                            # Continue where left off from last match.
                                    # Must be an 'a' at this point
 )
 a                             # The 'a' to be replaced

 (                             # (2 start), Up to the next 'a' (to be written back)
      [^"a]* 
      (?:                           # --------------------
           (?= a .*? " )                 # If stopped before 'a', must be a quote ahead
        |                              # or,
           (?:                           # --------------------
                " [^"]* $                     # If stopped at a quote, check for EOS
             |                              # or, 
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )

                (?:                           # --- Cluster
                     " [^"a]*                      # Inside quotes with no 'a'
                     (?= " )
                     " [^"]*                       # Between quotes 
                     (?= " )
                )*                            # --- End cluster, 0 to many times

                " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                              # to be sucked up on the next match                    
           )                             # --------------------
      )                             # --------------------
 )                             # (2 end)

#6


0  

"Inside double quotes" is rather tricky, because there are may complicating scenarios to consider to fully automate this.

“内部双引号”是相当棘手的,因为要完全自动化,可能需要考虑复杂的场景。

What are your precise rules for "enclosed by quotes"? Do you need to consider multi-line quotes? Do you have quoted strings containing escaped quotes or quotes used other than starting/ending string quotation?

您对“引号括起来”的确切规则是什么?你需要考虑多行引号吗?除了开始/结束字符串引用之外,您是否有包含转义引号或引号的引号?

However there may be a fairly simple expression to do much of what you want.

然而,可能有一个相当简单的表达式来完成您想要的大部分工作。

Search expression: ("[^a"]*)a

搜索表达式:(“[^]*)

Replacement expression: $1b

替换表达式:$ 1 b

This doesn't consider inside or outside of quotes - you have do that visually. But it highlights text from the quote to the matching character, so you can quickly decide if this is inside or not.

这并不考虑引号的内部或外部——您可以在视觉上这样做。但是它突出显示了从引用到匹配字符的文本,因此您可以快速地确定它是否在内部。

If you can live with the visual inspection, then we can build up this pattern to include different quote types and upper and lower case.

如果您能够接受可视化检查,那么我们可以构建这个模式,包括不同的报价类型和大小写。