I'm having hard time figuring out how to substitute every white space with ''
, exept thous that apear betweeen " "
.
我很难弄清楚如何用''代替每个空白区域,在'“之前解决这个问题。”
For example -
例如 -
a = c + d;
is
a=c+d
and
foo ("hi bye", "bye hi");
is
foo("hi bye","bye hi");
I have tried something like
我尝试过类似的东西
re.sub('^(\"[^\"\n]*\")|\s|\\n', '', line)
but obviously that doesnt work.
但显然这不起作用。
2 个解决方案
#1
4
Find:
r'(".*?")|(\s+)'
Replace:
r'\1'
The idea is to ignore all characters inside quotes, by first matching all quotes with something inside (".*?"
) and replacing with the same (\1
).
我们的想法是忽略引号内的所有字符,首先将所有引号与内部(“。*?”)匹配并替换为相同的(\ 1)。
We know that the white spaces left (\s+
) will not be inside quotes (or the first rule would have matched them instead) and replace these white spaces with nothing.
我们知道左边的空格(\ s +)不会在引号内(或者第一个规则会匹配它们)并且不用任何东西替换这些空格。
看到它在行动
#2
1
Since you say in your comments that regex isn't required, I'm going to propose a novel concept: don't use regex.
既然你在评论中说正则表达式不是必需的,我将提出一个新概念:不要使用正则表达式。
Don't get me wrong. I love regex. It's an amazing powerful tool, and it can handle almost anything you ask of it, if you're willing to make a complicated enough expression. There are times regex is the perfect tool, and cleans up dozens of lines of code in one simple expression.
别误会我的意思。我喜欢正则表达式。它是一个了不起的强大工具,如果你愿意做出足够复杂的表达,它几乎可以处理你提出的任何问题。有时候正则表达式是一个完美的工具,并在一个简单的表达式中清理了几十行代码。
But this is a simple task, dependent on one simple thing: you need a state of whether you're within a quote.
但这是一个简单的任务,取决于一个简单的事情:你需要一个状态,你是否在引用范围内。
This code is so basic people may even say it's not pythonic. But it works, and anyone can read it.
这段代码是如此基本,人们甚至可能会说它不是pythonic。但它有效,任何人都可以阅读它。
def kill_spaces(test_str):
inside_quote = False
result = ""
for character in test_str:
if character != " " or inside_quote:
result += character
if character == '"':
inside_quote = not inside_quote
return result
test = 'foo ("hi bye", "bye hi");'
kill_spaces(test)
>>> 'foo("hi bye","bye hi");'
#1
4
Find:
r'(".*?")|(\s+)'
Replace:
r'\1'
The idea is to ignore all characters inside quotes, by first matching all quotes with something inside (".*?"
) and replacing with the same (\1
).
我们的想法是忽略引号内的所有字符,首先将所有引号与内部(“。*?”)匹配并替换为相同的(\ 1)。
We know that the white spaces left (\s+
) will not be inside quotes (or the first rule would have matched them instead) and replace these white spaces with nothing.
我们知道左边的空格(\ s +)不会在引号内(或者第一个规则会匹配它们)并且不用任何东西替换这些空格。
看到它在行动
#2
1
Since you say in your comments that regex isn't required, I'm going to propose a novel concept: don't use regex.
既然你在评论中说正则表达式不是必需的,我将提出一个新概念:不要使用正则表达式。
Don't get me wrong. I love regex. It's an amazing powerful tool, and it can handle almost anything you ask of it, if you're willing to make a complicated enough expression. There are times regex is the perfect tool, and cleans up dozens of lines of code in one simple expression.
别误会我的意思。我喜欢正则表达式。它是一个了不起的强大工具,如果你愿意做出足够复杂的表达,它几乎可以处理你提出的任何问题。有时候正则表达式是一个完美的工具,并在一个简单的表达式中清理了几十行代码。
But this is a simple task, dependent on one simple thing: you need a state of whether you're within a quote.
但这是一个简单的任务,取决于一个简单的事情:你需要一个状态,你是否在引用范围内。
This code is so basic people may even say it's not pythonic. But it works, and anyone can read it.
这段代码是如此基本,人们甚至可能会说它不是pythonic。但它有效,任何人都可以阅读它。
def kill_spaces(test_str):
inside_quote = False
result = ""
for character in test_str:
if character != " " or inside_quote:
result += character
if character == '"':
inside_quote = not inside_quote
return result
test = 'foo ("hi bye", "bye hi");'
kill_spaces(test)
>>> 'foo("hi bye","bye hi");'