正则表达式 - 替换特定字符exept特定字符串

时间:2022-06-02 00:20:08

I'm having hard time figuring out how to substitute every white space with '', exept thous that apear betweeen " ".

我很难弄清楚如何用''代替每个空白区域,在'“之前解决这个问题。”

For example -

例如 -

a = c + d;

is

a=c+d

and

foo ("hi bye",        "bye    hi");

is

foo("hi bye","bye    hi");

I have tried something like

我尝试过类似的东西

re.sub('^(\"[^\"\n]*\")|\s|\\n', '', line)

but obviously that doesnt work.

但显然这不起作用。

2 个解决方案

#1


4  

Find:

r'(".*?")|(\s+)'

Replace:

r'\1'

The idea is to ignore all characters inside quotes, by first matching all quotes with something inside (".*?") and replacing with the same (\1).

我们的想法是忽略引号内的所有字符,首先将所有引号与内部(“。*?”)匹配并替换为相同的(\ 1)。

We know that the white spaces left (\s+) will not be inside quotes (or the first rule would have matched them instead) and replace these white spaces with nothing.

我们知道左边的空格(\ s +)不会在引号内(或者第一个规则会匹配它们)并且不用任何东西替换这些空格。


See it in action

看到它在行动

#2


1  

Since you say in your comments that regex isn't required, I'm going to propose a novel concept: don't use regex.

既然你在评论中说正则表达式不是必需的,我将提出一个新概念:不要使用正则表达式。

Don't get me wrong. I love regex. It's an amazing powerful tool, and it can handle almost anything you ask of it, if you're willing to make a complicated enough expression. There are times regex is the perfect tool, and cleans up dozens of lines of code in one simple expression.

别误会我的意思。我喜欢正则表达式。它是一个了不起的强大工具,如果你愿意做出足够复杂的表达,它几乎可以处理你提出的任何问题。有时候正则表达式是一个完美的工具,并在一个简单的表达式中清理了几十行代码。

But this is a simple task, dependent on one simple thing: you need a state of whether you're within a quote.

但这是一个简单的任务,取决于一个简单的事情:你需要一个状态,你是否在引用范围内。

This code is so basic people may even say it's not pythonic. But it works, and anyone can read it.

这段代码是如此基本,人们甚至可能会说它不是pythonic。但它有效,任何人都可以阅读它。

def kill_spaces(test_str):
    inside_quote = False
    result = ""
    for character in test_str:
        if character != " " or inside_quote:
            result += character
        if character == '"':
            inside_quote = not inside_quote
    return result

test = 'foo ("hi bye",       "bye     hi");'
kill_spaces(test)
>>> 'foo("hi bye","bye     hi");'

#1


4  

Find:

r'(".*?")|(\s+)'

Replace:

r'\1'

The idea is to ignore all characters inside quotes, by first matching all quotes with something inside (".*?") and replacing with the same (\1).

我们的想法是忽略引号内的所有字符,首先将所有引号与内部(“。*?”)匹配并替换为相同的(\ 1)。

We know that the white spaces left (\s+) will not be inside quotes (or the first rule would have matched them instead) and replace these white spaces with nothing.

我们知道左边的空格(\ s +)不会在引号内(或者第一个规则会匹配它们)并且不用任何东西替换这些空格。


See it in action

看到它在行动

#2


1  

Since you say in your comments that regex isn't required, I'm going to propose a novel concept: don't use regex.

既然你在评论中说正则表达式不是必需的,我将提出一个新概念:不要使用正则表达式。

Don't get me wrong. I love regex. It's an amazing powerful tool, and it can handle almost anything you ask of it, if you're willing to make a complicated enough expression. There are times regex is the perfect tool, and cleans up dozens of lines of code in one simple expression.

别误会我的意思。我喜欢正则表达式。它是一个了不起的强大工具,如果你愿意做出足够复杂的表达,它几乎可以处理你提出的任何问题。有时候正则表达式是一个完美的工具,并在一个简单的表达式中清理了几十行代码。

But this is a simple task, dependent on one simple thing: you need a state of whether you're within a quote.

但这是一个简单的任务,取决于一个简单的事情:你需要一个状态,你是否在引用范围内。

This code is so basic people may even say it's not pythonic. But it works, and anyone can read it.

这段代码是如此基本,人们甚至可能会说它不是pythonic。但它有效,任何人都可以阅读它。

def kill_spaces(test_str):
    inside_quote = False
    result = ""
    for character in test_str:
        if character != " " or inside_quote:
            result += character
        if character == '"':
            inside_quote = not inside_quote
    return result

test = 'foo ("hi bye",       "bye     hi");'
kill_spaces(test)
>>> 'foo("hi bye","bye     hi");'