
时间:2022-12-27 19:21:21

Problem: I have thousands of documents which contains a specific character I don't want. E.g. the character a. These documents contain a variety of characters, but the a's I want to replace are inside double quotes or single quotes.


I would like to find and replace them, and I thought using Regex would be needed. I am using VSCode, but I'm open to any suggestions.


My attempt: I was able to find the following regex to match for a specific string containing the values inside the ().



However, this only highlights the entire quote. I want to highlight the character only.


Any solution, perhaps outside of regex, is welcome.


Example outcomes: Given, the character is a, find replace to b


Somebody once told me "apples" are good for you => Somebody once told me "bpples" are good for you


"Aardvarks" make good kebabs => "Abrdvbrks" make good kebabs

"Aardvarks"做了好吃的烤肉串=> "Abrdvbrks"做了好吃的烤肉串

The boy said "aaah!" when his mom told him he was eating aardvark => The boy said "bbbh!" when his mom told him he was eating aardvark


6 个解决方案



Visual Studio Code

VS Code uses JavaScript RegEx engine for its find / replace functionality. This means you are very limited in working with regex in comparison to other flavors like .NET or PCRE.

VS代码使用JavaScript RegEx引擎进行查找/替换功能。这意味着与. net或PCRE等其他版本相比,您在使用regex方面非常有限。

Lucky enough that this flavor supports lookaheads and with lookaheads you are able to look for but not consume character. So one way to ensure that we are within a quoted string is to look for number of quotes down to bottom of file / subject string to be odd after matching an a:



Live demo


This looks for as in a double quoted string, to have it for single quoted strings substitute all "s with '. You can't have both at a time.


There is a problem with regex above however, that it conflicts with escaped double quotes within double quoted strings. To match them too if it matters you have a long way to go:



Applying these approaches on large files probably will result in an stack overflow so let's see a better approach.


I am using VSCode, but I'm open to any suggestions.


That's great. Then I'd suggest to use awk or sed or something more programmatic in order to achieve what you are after or if you are able to use Sublime Text a chance exists to work around this problem in a more elegant way.


Sublime Text

This is supposed to work on large files with hundred of thousands of lines but care that it works for a single character (here a) that with some modifications may work for a word or substring too:


Search for:


                           ^              ^            ^

Replace it with: WHATEVER\3

换成:无论\ 3

Live demo


RegEx Breakdown:


(?: # Beginning of non-capturing group #1
    "   # Match a `"`
    |   # Or
    \G(?<!")(?!\A)  # Continue matching from last successful match
                    # It shouldn't start right after a `"`
)   # End of NCG #1
(?<r>   # Start of capturing group `r`
    [^a"\\]*+   # Match anything except `a`, `"` or a backslash (possessively)
    (?>\\.[^a"\\]*)*+   # Match an escaped character or 
                        # repeat last pattern as much as possible
)\K     # End of CG `r`, reset all consumed characters
(   # Start of CG #2 
    a   # Match literal `a`
    |   # Or
    "(*SKIP)(*F)    # Match a `"` and skip over current match
(?(?=   # Start a conditional cluster, assuming a positive lookahead
    ((?&r)")    # Start of CG #3, recurs CG `r` and match `"`
  )     # End of condition
  \3    # If conditional passed match CG #3
 )  # End of conditional


Three-step approach

Last but not least...


Matching a character inside quotation marks is tricky since delimiters are exactly the same so opening and closing marks can not be distinguished from each other without taking a look at adjacent strings. What you can do is change a delimiter to something else so that you can look for it later.


Step 1:

Search for: "[^"\\]*(?:\\.[^"\\]*)*"

搜索:“[^ " \ \]*(?:\ \[^。”\ \]*)*”

Replace with: $0Я

替换为:$ 0Я

Step 2:

Search for: a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

搜索:(? =[^ " \ \]*(?:\ \[^。”\ \]*)*”Я)

Replace with whatever you expect.


Step 3:

Search for:


Replace with nothing to revert every thing.




Firstly a few of considerations:


  1. There could be multiple a characters within a single quote.
  2. 在一个引用中可能有多个a字符。
  3. Each quote (using single or double quotation marks) consists of an opening quote character, some text and the same closing quote character. A simple approach is to assume that when the quote characters are counted sequentially, the odd ones are opening quotes and the even ones are closing quotes.
  4. 每个引语(使用单引号或双引号)由一个开始引语字符、一些文本和相同的结束引语字符组成。一个简单的方法是假设当引用字符按顺序计数时,奇数是开引号,偶数是闭引号。
  5. Following point 2, it could be worth some further thought on whether single-quoted strings should be allowed. See the following example: It's a shame 'this quoted text' isn't quoted. Here, the simple approach would think there were two quoted strings: s a shame and isn. Another: This isn't a quote ...'this is' and 'it's unclear where this quote ends'. I've avoided attempting to tackle these complexities and gone with the simple approach below.
  6. 接下来的第2点,对于是否应该允许单引号字符串进行进一步的思考是值得的。请看下面的例子:很遗憾“这段引用的文字”没有被引用。在这里,简单的方法会认为有两个被引用的字符串:it ' s a shame and isn ' t。另一句:这不是引用……“这是”和“不清楚这句话的结尾是什么”。我避免尝试处理这些复杂的问题,而是采用下面的简单方法。

The bad news is that point 1 presents a bit of a problem, as a capturing group with a wildcard repeat character after it (e.g. (.*)*) will only capture the last captured "thing". But the good news is there's a way of getting around this within certain limits. Many regex engines will allow up to 99 capturing groups (*). So if we can make the assumption that there will be no more than 99 as in each quote (UPDATE ...or even if we can't - see step 3), we can do the following...


(*) Unfortunately my first port of call, Notepad++ doesn't - it only allows up to 9. Not sure about VS Code. But regex101 (used for the online demos below) does.


TL;DR - What to do?

  1. Search for: "([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*"
  2. 搜索:[^“]”(*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*”
  3. Replace with: "\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31\32\33\34\35\36\37\38\39\40\41\42\43\44\45\46\47\48\49\50\51\52\53\54\55\56\57\58\59\60\61\62\63\64\65\66\67\68\69\70\71\72\73\74\75\76\77\78\79\80\81\82\83\84\85\86\87\88\89\90\91\92\93\94\95\96\97\98\99"
  4. 替换为:“1 \ \ 2 \ 3 \ 4 \ 5 \ \ 6 7 8 \ \ 9 10 \ \ 11 \ 12 13 \ 14、15、16 \ 17 \ \ 18 19 \ \ 20 \ 21 \ 22 24 \ 25 \ \ 23 \ 26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48 \ 49 \ 50 51 \ \ 52 55 \ 53 \ 54 \ \ 56 \ 57 \ 58 59 \ \ 60 \ 61 \ 62 \ 63 \ 64 \ 65 \ 66 \ 67 \ 68 \ 69 \ 70 \ 71 \ 72 \ 73 \ 74 \ 75 \ 76 \ 77 \ 78 \ 79 \ 80 \ 81 \ 82 \ 83 \ 84 \ 85 \ 86 \ 87 \ 88 \ 89 \ 90 \ 91 \ 92 \ 93 \ 94 \ 95 \ 96 \ 97 \ 98 \ 99”
  5. (Optionally keep repeating steps the previous two steps if there's a possibility of > 99 such characters in a single quote until they've all been replaced).
  6. (如果有可能在一个引用中出现>99这样的字符,可以继续重复前两个步骤,直到它们全部被替换)。
  7. Repeat step 1 but replacing all " with ' in the regular expression, i.e: '([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*'
  8. 重复第1步,但在正则表达式i中将“all”替换为“' i”。艾凡:”([^ ']*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*”
  9. Repeat steps 2-3.
  10. 重复步骤2 - 3。

Online demos

Please see the following regex101 demos, which could actually be used to perform the replacements if you're able to copy the whole text into the contents of "TEST STRING":





With the replace pattern:



As far as I'm aware, VS Code uses the same regex engine as JavaScript, which is why I've written my example in JS.


The problem with this is that if you have multiple a's in 1 set of quotes, then it will struggle to pull out the right values, so there needs to be some sort of code behind it, or you, hammering the replace button until no more matches are found, to recurse the pattern and get rid of all the a's in between quotes


let regex = /(["'])(.*?)(a)(.*?\1)/g,
subst = `$1$2$4`,
str = `"a"
Not matched - aaaaaaa
"This is the way the world ends"
"Not with fire"
'I can haz cheezburger'
"This is not a match'

// Loop to get rid of multiple a's in quotes
    str = str.replace(regex, subst);

const result = str;



If you can use Visual Studio (instead of Visual Studio Code), it is written in C++ and C# and uses the .NET Framework regular expressions, which means you can use variable length lookbehinds to accomplish this.

如果您可以使用Visual Studio(而不是Visual Studio代码),它是用c++和c#编写的,并且使用。net框架正则表达式,这意味着您可以使用可变长度的lookbehind来实现这一点。


Adding some more logic to the above regular expression, we can tell it to ignore any locations where there are an even amount of " preceding it. This prevents matches for a outside of quotes. Take, for example, the string "a" a "a". Only the first and last a in this string will be matched, but the one in the middle will be ignored.



Now the only problem is this will break if we have escaped " within two double quotes such as "a\"" a "a". We need to add more logic to prevent this behaviour. Luckily, this beautiful answer exists for properly matching escaped ". Adding this logic to the regex above, we get the following:



I'm not sure which method works best with your strings, but I'll explain this last regex in detail as it also explains the two previous ones.


  • (?<!^[^"\n]*(?:(?:"(?:[^"\\\n]|\\.)*){2})+) Negative lookbehind ensuring what precedes doesn't match the following
    • ^ Assert position at the start of the line
    • ^断言位置的线
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • (?:(?:"(?:[^"\\\n]|\\.)*){2})+ Match the following one or more times. This ensures if there are any " preceding the match that they are balanced in the sense that there is an opening and closing double quote.
      • (?:"(?:[^"\\\n]|\\.)*){2} Match the following exactly twice
      • (?:“(?):[^ " \ \ \ n]| \ \)*){ 2 }匹配下面的两倍
      • " Match this literally
      • “匹配这个字面意思
      • (?:[^"\\\n]|\\.)* Match either of the following any number of times
        • [^"\\\n] Match anything except ", \ and \n
        • [^ " \ \ \ n]匹配除了",\ \ n
        • \\. Matches \ followed by any character
        • \ \。匹配\后面跟着任何字符
      • (?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
    • (?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (? < ! ^ ^“\ n *(?(?::“(?:[^ " \ \ \ n]| \ \)*){ 2 })+)-向后插入确保之前不匹配以下^断言位置的线(^“\ n]*匹配除了”或任何次数\ n(?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (?<="[^"\n]*) Positive lookbehind ensuring what precedes matches the following
    • " Match this literally
    • “匹配这个字面意思
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
  • (? < = "[^ " \ n]*)积极向后插入确保之前匹配以下“匹配这个字面上[^“\ n]*匹配除了”或\ n的次数
  • a Match this literally
  • 匹配这字面上的
  • (?=[^"\n]*") Positive lookahead ensuring what follows matches the following
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • " Match this literally
    • “匹配这个字面意思
  • (? =[^“\ n]*”)积极超前确保接下来的比赛后[^“\ n]*匹配除了”或\ n匹配这个任意次数”的意思

You can drop the \n from the above pattern as the following suggests. I added it just in case there's some sort of special cases I'm not considering (i.e. comments) that could break this regex within your text. The \A also forces the regex to match from the start of the string (or file) instead of the start of the line.



You can test this regex here


This is what it looks like in Visual Studio:

这就是在Visual Studio中的样子:




I am using VSCode, but I'm open to any suggestions.


If you want to stay in an Editor environment, you could use
Visual Studio (>= 2012) or even notepad++ for quick fixup.
This avoids having to use a spurious script environment.

如果您想要停留在编辑器环境中,可以使用Visual Studio(>= 2012),甚至可以使用notepad++快速修复。这就避免了使用伪脚本环境。

Both of these engines (Dot-Net and boost, respectively) use the \G construct.
Which is start the next match at the position where the last one left off.


Again, this is just a suggestion.


This regex doesn't check the validity of balanced quotes within the entire
string ahead of time (but it could with the addition of a single line).


It is all about knowing where the inside and outside of quotes are.


I've commented the regex, but if you need more info let me know.
Again this is just a suggestion (I know your editor uses ECMAScript).


Find (?s)(?:^([^"]*(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)|(?!^)\G)a([^"a]*(?:(?=a.*?")|(?:"[^"]*$|"[^"]*(?=")(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)))
Replace $1b$2

找到(?)(?:^ ^”)*(?:“^”)*(? =”)[^]*(? =))*(^ ")*)|(? ! ^)\ G)”([^]*(吗?(? = . * ?”)|(?:“[^]* $ | "[^ "]*(? = ")(?:“^”)*(? =”)[^]*(? =))*[^]*)))取代1 b 2美元

That's all there is to it.






(?s)                          # Dot-all inine modifier
      ^                             # BOS 
      (                             # (1 start), Find first quote from BOS (written back)
           (?:                           # --- Cluster
                " [^"a]*                      # Inside quotes with no 'a'
                (?= " )
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )
           )*                            # --- End cluster, 0 to many times

           " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                         # to be sucked up by this match           
      )                             # (1 end)

   |                              # OR,

      (?! ^ )                       # Not-BOS 
      \G                            # Continue where left off from last match.
                                    # Must be an 'a' at this point
 a                             # The 'a' to be replaced

 (                             # (2 start), Up to the next 'a' (to be written back)
      (?:                           # --------------------
           (?= a .*? " )                 # If stopped before 'a', must be a quote ahead
        |                              # or,
           (?:                           # --------------------
                " [^"]* $                     # If stopped at a quote, check for EOS
             |                              # or, 
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )

                (?:                           # --- Cluster
                     " [^"a]*                      # Inside quotes with no 'a'
                     (?= " )
                     " [^"]*                       # Between quotes 
                     (?= " )
                )*                            # --- End cluster, 0 to many times

                " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                              # to be sucked up on the next match                    
           )                             # --------------------
      )                             # --------------------
 )                             # (2 end)



"Inside double quotes" is rather tricky, because there are may complicating scenarios to consider to fully automate this.


What are your precise rules for "enclosed by quotes"? Do you need to consider multi-line quotes? Do you have quoted strings containing escaped quotes or quotes used other than starting/ending string quotation?


However there may be a fairly simple expression to do much of what you want.


Search expression: ("[^a"]*)a


Replacement expression: $1b

替换表达式:$ 1 b

This doesn't consider inside or outside of quotes - you have do that visually. But it highlights text from the quote to the matching character, so you can quickly decide if this is inside or not.


If you can live with the visual inspection, then we can build up this pattern to include different quote types and upper and lower case.




Visual Studio Code

VS Code uses JavaScript RegEx engine for its find / replace functionality. This means you are very limited in working with regex in comparison to other flavors like .NET or PCRE.

VS代码使用JavaScript RegEx引擎进行查找/替换功能。这意味着与. net或PCRE等其他版本相比,您在使用regex方面非常有限。

Lucky enough that this flavor supports lookaheads and with lookaheads you are able to look for but not consume character. So one way to ensure that we are within a quoted string is to look for number of quotes down to bottom of file / subject string to be odd after matching an a:



Live demo


This looks for as in a double quoted string, to have it for single quoted strings substitute all "s with '. You can't have both at a time.


There is a problem with regex above however, that it conflicts with escaped double quotes within double quoted strings. To match them too if it matters you have a long way to go:



Applying these approaches on large files probably will result in an stack overflow so let's see a better approach.


I am using VSCode, but I'm open to any suggestions.


That's great. Then I'd suggest to use awk or sed or something more programmatic in order to achieve what you are after or if you are able to use Sublime Text a chance exists to work around this problem in a more elegant way.


Sublime Text

This is supposed to work on large files with hundred of thousands of lines but care that it works for a single character (here a) that with some modifications may work for a word or substring too:


Search for:


                           ^              ^            ^

Replace it with: WHATEVER\3

换成:无论\ 3

Live demo


RegEx Breakdown:


(?: # Beginning of non-capturing group #1
    "   # Match a `"`
    |   # Or
    \G(?<!")(?!\A)  # Continue matching from last successful match
                    # It shouldn't start right after a `"`
)   # End of NCG #1
(?<r>   # Start of capturing group `r`
    [^a"\\]*+   # Match anything except `a`, `"` or a backslash (possessively)
    (?>\\.[^a"\\]*)*+   # Match an escaped character or 
                        # repeat last pattern as much as possible
)\K     # End of CG `r`, reset all consumed characters
(   # Start of CG #2 
    a   # Match literal `a`
    |   # Or
    "(*SKIP)(*F)    # Match a `"` and skip over current match
(?(?=   # Start a conditional cluster, assuming a positive lookahead
    ((?&r)")    # Start of CG #3, recurs CG `r` and match `"`
  )     # End of condition
  \3    # If conditional passed match CG #3
 )  # End of conditional


Three-step approach

Last but not least...


Matching a character inside quotation marks is tricky since delimiters are exactly the same so opening and closing marks can not be distinguished from each other without taking a look at adjacent strings. What you can do is change a delimiter to something else so that you can look for it later.


Step 1:

Search for: "[^"\\]*(?:\\.[^"\\]*)*"

搜索:“[^ " \ \]*(?:\ \[^。”\ \]*)*”

Replace with: $0Я

替换为:$ 0Я

Step 2:

Search for: a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

搜索:(? =[^ " \ \]*(?:\ \[^。”\ \]*)*”Я)

Replace with whatever you expect.


Step 3:

Search for:


Replace with nothing to revert every thing.




Firstly a few of considerations:


  1. There could be multiple a characters within a single quote.
  2. 在一个引用中可能有多个a字符。
  3. Each quote (using single or double quotation marks) consists of an opening quote character, some text and the same closing quote character. A simple approach is to assume that when the quote characters are counted sequentially, the odd ones are opening quotes and the even ones are closing quotes.
  4. 每个引语(使用单引号或双引号)由一个开始引语字符、一些文本和相同的结束引语字符组成。一个简单的方法是假设当引用字符按顺序计数时,奇数是开引号,偶数是闭引号。
  5. Following point 2, it could be worth some further thought on whether single-quoted strings should be allowed. See the following example: It's a shame 'this quoted text' isn't quoted. Here, the simple approach would think there were two quoted strings: s a shame and isn. Another: This isn't a quote ...'this is' and 'it's unclear where this quote ends'. I've avoided attempting to tackle these complexities and gone with the simple approach below.
  6. 接下来的第2点,对于是否应该允许单引号字符串进行进一步的思考是值得的。请看下面的例子:很遗憾“这段引用的文字”没有被引用。在这里,简单的方法会认为有两个被引用的字符串:it ' s a shame and isn ' t。另一句:这不是引用……“这是”和“不清楚这句话的结尾是什么”。我避免尝试处理这些复杂的问题,而是采用下面的简单方法。

The bad news is that point 1 presents a bit of a problem, as a capturing group with a wildcard repeat character after it (e.g. (.*)*) will only capture the last captured "thing". But the good news is there's a way of getting around this within certain limits. Many regex engines will allow up to 99 capturing groups (*). So if we can make the assumption that there will be no more than 99 as in each quote (UPDATE ...or even if we can't - see step 3), we can do the following...


(*) Unfortunately my first port of call, Notepad++ doesn't - it only allows up to 9. Not sure about VS Code. But regex101 (used for the online demos below) does.


TL;DR - What to do?

  1. Search for: "([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*"
  2. 搜索:[^“]”(*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*((^”)*)*([^]*)*”
  3. Replace with: "\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31\32\33\34\35\36\37\38\39\40\41\42\43\44\45\46\47\48\49\50\51\52\53\54\55\56\57\58\59\60\61\62\63\64\65\66\67\68\69\70\71\72\73\74\75\76\77\78\79\80\81\82\83\84\85\86\87\88\89\90\91\92\93\94\95\96\97\98\99"
  4. 替换为:“1 \ \ 2 \ 3 \ 4 \ 5 \ \ 6 7 8 \ \ 9 10 \ \ 11 \ 12 13 \ 14、15、16 \ 17 \ \ 18 19 \ \ 20 \ 21 \ 22 24 \ 25 \ \ 23 \ 26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48 \ 49 \ 50 51 \ \ 52 55 \ 53 \ 54 \ \ 56 \ 57 \ 58 59 \ \ 60 \ 61 \ 62 \ 63 \ 64 \ 65 \ 66 \ 67 \ 68 \ 69 \ 70 \ 71 \ 72 \ 73 \ 74 \ 75 \ 76 \ 77 \ 78 \ 79 \ 80 \ 81 \ 82 \ 83 \ 84 \ 85 \ 86 \ 87 \ 88 \ 89 \ 90 \ 91 \ 92 \ 93 \ 94 \ 95 \ 96 \ 97 \ 98 \ 99”
  5. (Optionally keep repeating steps the previous two steps if there's a possibility of > 99 such characters in a single quote until they've all been replaced).
  6. (如果有可能在一个引用中出现>99这样的字符,可以继续重复前两个步骤,直到它们全部被替换)。
  7. Repeat step 1 but replacing all " with ' in the regular expression, i.e: '([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*'
  8. 重复第1步,但在正则表达式i中将“all”替换为“' i”。艾凡:”([^ ']*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*([^]*)*”
  9. Repeat steps 2-3.
  10. 重复步骤2 - 3。

Online demos

Please see the following regex101 demos, which could actually be used to perform the replacements if you're able to copy the whole text into the contents of "TEST STRING":





With the replace pattern:



As far as I'm aware, VS Code uses the same regex engine as JavaScript, which is why I've written my example in JS.


The problem with this is that if you have multiple a's in 1 set of quotes, then it will struggle to pull out the right values, so there needs to be some sort of code behind it, or you, hammering the replace button until no more matches are found, to recurse the pattern and get rid of all the a's in between quotes


let regex = /(["'])(.*?)(a)(.*?\1)/g,
subst = `$1$2$4`,
str = `"a"
Not matched - aaaaaaa
"This is the way the world ends"
"Not with fire"
'I can haz cheezburger'
"This is not a match'

// Loop to get rid of multiple a's in quotes
    str = str.replace(regex, subst);

const result = str;



If you can use Visual Studio (instead of Visual Studio Code), it is written in C++ and C# and uses the .NET Framework regular expressions, which means you can use variable length lookbehinds to accomplish this.

如果您可以使用Visual Studio(而不是Visual Studio代码),它是用c++和c#编写的,并且使用。net框架正则表达式,这意味着您可以使用可变长度的lookbehind来实现这一点。


Adding some more logic to the above regular expression, we can tell it to ignore any locations where there are an even amount of " preceding it. This prevents matches for a outside of quotes. Take, for example, the string "a" a "a". Only the first and last a in this string will be matched, but the one in the middle will be ignored.



Now the only problem is this will break if we have escaped " within two double quotes such as "a\"" a "a". We need to add more logic to prevent this behaviour. Luckily, this beautiful answer exists for properly matching escaped ". Adding this logic to the regex above, we get the following:



I'm not sure which method works best with your strings, but I'll explain this last regex in detail as it also explains the two previous ones.


  • (?<!^[^"\n]*(?:(?:"(?:[^"\\\n]|\\.)*){2})+) Negative lookbehind ensuring what precedes doesn't match the following
    • ^ Assert position at the start of the line
    • ^断言位置的线
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • (?:(?:"(?:[^"\\\n]|\\.)*){2})+ Match the following one or more times. This ensures if there are any " preceding the match that they are balanced in the sense that there is an opening and closing double quote.
      • (?:"(?:[^"\\\n]|\\.)*){2} Match the following exactly twice
      • (?:“(?):[^ " \ \ \ n]| \ \)*){ 2 }匹配下面的两倍
      • " Match this literally
      • “匹配这个字面意思
      • (?:[^"\\\n]|\\.)* Match either of the following any number of times
        • [^"\\\n] Match anything except ", \ and \n
        • [^ " \ \ \ n]匹配除了",\ \ n
        • \\. Matches \ followed by any character
        • \ \。匹配\后面跟着任何字符
      • (?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
    • (?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (? < ! ^ ^“\ n *(?(?::“(?:[^ " \ \ \ n]| \ \)*){ 2 })+)-向后插入确保之前不匹配以下^断言位置的线(^“\ n]*匹配除了”或任何次数\ n(?:?:“(?:[^ " \ \ \ n]| \ \)*){ 2 })+匹配一次或多次。这就确保了在比赛之前有任何“在有开始和结束双引号的意义上,他们是平衡的”。(?:“(?):[^ " \ \ \ n]| \ \。)*){ 2 }匹配以下到底两次”匹配这个字面上(?:[^ " \ \ \ n]| \ \)*匹配的以下任意次数[^ " \ \ \ n]匹配除了",\ \ n \ \。匹配\后面跟着任何字符
  • (?<="[^"\n]*) Positive lookbehind ensuring what precedes matches the following
    • " Match this literally
    • “匹配这个字面意思
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
  • (? < = "[^ " \ n]*)积极向后插入确保之前匹配以下“匹配这个字面上[^“\ n]*匹配除了”或\ n的次数
  • a Match this literally
  • 匹配这字面上的
  • (?=[^"\n]*") Positive lookahead ensuring what follows matches the following
    • [^"\n]* Match anything except " or \n any number of times
    • [^ " \ n]*匹配除了”或任何次数\ n
    • " Match this literally
    • “匹配这个字面意思
  • (? =[^“\ n]*”)积极超前确保接下来的比赛后[^“\ n]*匹配除了”或\ n匹配这个任意次数”的意思

You can drop the \n from the above pattern as the following suggests. I added it just in case there's some sort of special cases I'm not considering (i.e. comments) that could break this regex within your text. The \A also forces the regex to match from the start of the string (or file) instead of the start of the line.



You can test this regex here


This is what it looks like in Visual Studio:

这就是在Visual Studio中的样子:




I am using VSCode, but I'm open to any suggestions.


If you want to stay in an Editor environment, you could use
Visual Studio (>= 2012) or even notepad++ for quick fixup.
This avoids having to use a spurious script environment.

如果您想要停留在编辑器环境中,可以使用Visual Studio(>= 2012),甚至可以使用notepad++快速修复。这就避免了使用伪脚本环境。

Both of these engines (Dot-Net and boost, respectively) use the \G construct.
Which is start the next match at the position where the last one left off.


Again, this is just a suggestion.


This regex doesn't check the validity of balanced quotes within the entire
string ahead of time (but it could with the addition of a single line).


It is all about knowing where the inside and outside of quotes are.


I've commented the regex, but if you need more info let me know.
Again this is just a suggestion (I know your editor uses ECMAScript).


Find (?s)(?:^([^"]*(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)|(?!^)\G)a([^"a]*(?:(?=a.*?")|(?:"[^"]*$|"[^"]*(?=")(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)))
Replace $1b$2

找到(?)(?:^ ^”)*(?:“^”)*(? =”)[^]*(? =))*(^ ")*)|(? ! ^)\ G)”([^]*(吗?(? = . * ?”)|(?:“[^]* $ | "[^ "]*(? = ")(?:“^”)*(? =”)[^]*(? =))*[^]*)))取代1 b 2美元

That's all there is to it.






(?s)                          # Dot-all inine modifier
      ^                             # BOS 
      (                             # (1 start), Find first quote from BOS (written back)
           (?:                           # --- Cluster
                " [^"a]*                      # Inside quotes with no 'a'
                (?= " )
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )
           )*                            # --- End cluster, 0 to many times

           " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                         # to be sucked up by this match           
      )                             # (1 end)

   |                              # OR,

      (?! ^ )                       # Not-BOS 
      \G                            # Continue where left off from last match.
                                    # Must be an 'a' at this point
 a                             # The 'a' to be replaced

 (                             # (2 start), Up to the next 'a' (to be written back)
      (?:                           # --------------------
           (?= a .*? " )                 # If stopped before 'a', must be a quote ahead
        |                              # or,
           (?:                           # --------------------
                " [^"]* $                     # If stopped at a quote, check for EOS
             |                              # or, 
                " [^"]*                       # Between quotes, get up to next quote
                (?= " )

                (?:                           # --- Cluster
                     " [^"a]*                      # Inside quotes with no 'a'
                     (?= " )
                     " [^"]*                       # Between quotes 
                     (?= " )
                )*                            # --- End cluster, 0 to many times

                " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                              # to be sucked up on the next match                    
           )                             # --------------------
      )                             # --------------------
 )                             # (2 end)



"Inside double quotes" is rather tricky, because there are may complicating scenarios to consider to fully automate this.


What are your precise rules for "enclosed by quotes"? Do you need to consider multi-line quotes? Do you have quoted strings containing escaped quotes or quotes used other than starting/ending string quotation?


However there may be a fairly simple expression to do much of what you want.


Search expression: ("[^a"]*)a


Replacement expression: $1b

替换表达式:$ 1 b

This doesn't consider inside or outside of quotes - you have do that visually. But it highlights text from the quote to the matching character, so you can quickly decide if this is inside or not.


If you can live with the visual inspection, then we can build up this pattern to include different quote types and upper and lower case.
