删除字符串中的逗号,用逗号和双引号/ Python包围

时间:2020-12-27 21:43:08

I've found some similar themes on *, but I'm newbie to Python and Reg Exps.

我在*上发现了一些类似的主题,但我是Python和Reg Exps的新手。

I have a string

我有一个字符串

,"Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin's ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel.",

“2星级高级酒店Ibis Berlin Messe酒店于2009年经过全面翻修,拥有168间空调客房,毗邻柏林的ICC和展览中心。所有客房均配有无线网络连接,您可以免费上网。大堂的两台iPoint-PC充电。我们提供24小时营业的酒吧,小吃和接待服务。在8楼的凌晨4点到12点享用我们的自助早餐,在那里您可以欣赏到柏林的美景。你会发现免费的停车场就在酒店旁边。“,

A pattern should be like: comma, double quote|any text with commas |double quote, comma. I need to replace commas in double quotes, for example with @ character. Which reg exp pattern should I use?

模式应该是:逗号,双引号|带逗号的任何文本|双引号,逗号。我需要用双引号替换逗号,例如用@字符替换。我应该使用哪种reg exp模式?

I tried this :

我试过这个:

r',"([.*]*,[.*]*)*",' 

with different variations, but it doesn't work.

有不同的变化,但它不起作用。

Thanks for the answers, the problem was solved.

谢谢你的答案,问题解决了。

4 个解决方案

#1


2  

If all you need to do is replace commas with the @ character you should look into doing a str_replace rather than regex.

如果您需要做的就是用@字符替换逗号,您应该考虑使用str_replace而不是正则表达式。

str_a = "Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin's ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel."

str_a = str_a.replace('","', '@') #commas inside double quotes
str_a = str_a.replace(',', '@') #replace just commas

print str_a

Edit: Alternatively you could make a list of what you want to replace, then loop through it and perform the replacement. Ex:

编辑:或者您可以列出要替换的内容,然后循环浏览并执行替换。例如:

to_replace = ['""', ',', '"']

str_a = "Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin's ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel."

for a in to_replace:
    str_a = str_a.replace(a, '@')

print str_a

#2


2  

Hmm, your regex is suspicious.

嗯,你的正则表达式是可疑的。

,"([.*]*,[.*]*)*",

[.*] will match either a literal dot or an asterisk (. and * become literals in character classes).

[。*]将匹配文字点或星号(。和*成为字符类中的文字)。

Additionally, if this could actually match something in the string, you would be able to replace only one comma, because the rest of the string (comma included) would have been consumed by the regex and once consumed, cannot be substituted again, unless you run a loop until there's no more commas to replace.

另外,如果这实际上可以匹配字符串中的某些内容,那么您将只能替换一个逗号,因为字符串的其余部分(包括逗号)将由正则表达式使用并且一旦被使用,则无法再次替换,除非您运行一个循环,直到没有更多的逗号来替换。

What you can do with re.sub and replace those commas is to use lookarounds (you can google it, there's enough documentation about them I believe). If you have only one pair of double quotes, you can make sure that only commas followed by one double quote are replaced:

使用re.sub和替换这些逗号可以做的就是使用lookarounds(你可以google它,我相信有足够的关于它们的文档)。如果您只有一对双引号,则可以确保仅替换逗号后跟一个双引号:

,(?=[^"]*"[^"]*$)

[^"] means a character which is not a double quote. [^"]* means that this will repeat 0 or more times.

[^“]表示不是双引号的字符。[^”] *表示这将重复0次或更多次。

The $ is to mean the end of the line.

$是指该行的结尾。

Now, the lookahead (?= ... ) makes sure that there's what's inside in front of the comma.

现在,前瞻(?= ...)确保在逗号前面有什么内容。

See the commas that match here.

请参阅此处匹配的逗号。

After that, you can simply replace the commas by whichever value you want.

之后,您可以简单地用您想要的任何值替换逗号。

str = re.sub(r',(?=[^"]*"[^"]*$)', '@', str)

If however there are multiple double quotes, you should make sure that there are an odd number of double quotes ahead. This can be done by using the regex:

但是,如果有多个双引号,则应确保前面有奇数个双引号。这可以通过使用正则表达式来完成:

,(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

(?: ... ) by the way is a non-capture group.

顺便说一下(?:...)是一个非捕获组。

#3


2  

You can try this (quite deadly though). The trick here is that, any character inside a pair of double quotes, is followed by odd number of double quotes, assuming of course, your double quotes are balanced:

你可以试试这个(非常致命)。这里的诀窍是,一对双引号内的任何字符后面跟着奇数个双引号,假设当然,你的双引号是平衡的:

s = 'some comma , outside "Some comma , inside" , "Completely , renovated in 2009",'

import re
s = re.sub(r',(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)', "@", s)
print s

Output:

输出:

some comma , outside "Some comma @ inside" , "Completely @ renovated in 2009",

#4


2  

If the pattern is always as stated, the following code snippet will do what you want:

如果模式始终如所述,则以下代码段将执行您想要的操作:

text = ',' + text[1:-2].replace(',', '@') + ','

Discussion

  • text[1:-2] will give you the original string, minus the first and last characters (the commas)
  • text [1:-2]会给你原始字符串,减去第一个和最后一个字符(逗号)
  • We then call .replace() to turn all the commas to at signs
  • 然后我们调用.replace()将所有逗号转换为符号
  • Finally, we put back the first and last commas to form the resulting string
  • 最后,我们放回第一个和最后一个逗号来形成结果字符串

#1


2  

If all you need to do is replace commas with the @ character you should look into doing a str_replace rather than regex.

如果您需要做的就是用@字符替换逗号,您应该考虑使用str_replace而不是正则表达式。

str_a = "Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin's ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel."

str_a = str_a.replace('","', '@') #commas inside double quotes
str_a = str_a.replace(',', '@') #replace just commas

print str_a

Edit: Alternatively you could make a list of what you want to replace, then loop through it and perform the replacement. Ex:

编辑:或者您可以列出要替换的内容,然后循环浏览并执行替换。例如:

to_replace = ['""', ',', '"']

str_a = "Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin's ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel."

for a in to_replace:
    str_a = str_a.replace(a, '@')

print str_a

#2


2  

Hmm, your regex is suspicious.

嗯,你的正则表达式是可疑的。

,"([.*]*,[.*]*)*",

[.*] will match either a literal dot or an asterisk (. and * become literals in character classes).

[。*]将匹配文字点或星号(。和*成为字符类中的文字)。

Additionally, if this could actually match something in the string, you would be able to replace only one comma, because the rest of the string (comma included) would have been consumed by the regex and once consumed, cannot be substituted again, unless you run a loop until there's no more commas to replace.

另外,如果这实际上可以匹配字符串中的某些内容,那么您将只能替换一个逗号,因为字符串的其余部分(包括逗号)将由正则表达式使用并且一旦被使用,则无法再次替换,除非您运行一个循环,直到没有更多的逗号来替换。

What you can do with re.sub and replace those commas is to use lookarounds (you can google it, there's enough documentation about them I believe). If you have only one pair of double quotes, you can make sure that only commas followed by one double quote are replaced:

使用re.sub和替换这些逗号可以做的就是使用lookarounds(你可以google它,我相信有足够的关于它们的文档)。如果您只有一对双引号,则可以确保仅替换逗号后跟一个双引号:

,(?=[^"]*"[^"]*$)

[^"] means a character which is not a double quote. [^"]* means that this will repeat 0 or more times.

[^“]表示不是双引号的字符。[^”] *表示这将重复0次或更多次。

The $ is to mean the end of the line.

$是指该行的结尾。

Now, the lookahead (?= ... ) makes sure that there's what's inside in front of the comma.

现在,前瞻(?= ...)确保在逗号前面有什么内容。

See the commas that match here.

请参阅此处匹配的逗号。

After that, you can simply replace the commas by whichever value you want.

之后,您可以简单地用您想要的任何值替换逗号。

str = re.sub(r',(?=[^"]*"[^"]*$)', '@', str)

If however there are multiple double quotes, you should make sure that there are an odd number of double quotes ahead. This can be done by using the regex:

但是,如果有多个双引号,则应确保前面有奇数个双引号。这可以通过使用正则表达式来完成:

,(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

(?: ... ) by the way is a non-capture group.

顺便说一下(?:...)是一个非捕获组。

#3


2  

You can try this (quite deadly though). The trick here is that, any character inside a pair of double quotes, is followed by odd number of double quotes, assuming of course, your double quotes are balanced:

你可以试试这个(非常致命)。这里的诀窍是,一对双引号内的任何字符后面跟着奇数个双引号,假设当然,你的双引号是平衡的:

s = 'some comma , outside "Some comma , inside" , "Completely , renovated in 2009",'

import re
s = re.sub(r',(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)', "@", s)
print s

Output:

输出:

some comma , outside "Some comma @ inside" , "Completely @ renovated in 2009",

#4


2  

If the pattern is always as stated, the following code snippet will do what you want:

如果模式始终如所述,则以下代码段将执行您想要的操作:

text = ',' + text[1:-2].replace(',', '@') + ','

Discussion

  • text[1:-2] will give you the original string, minus the first and last characters (the commas)
  • text [1:-2]会给你原始字符串,减去第一个和最后一个字符(逗号)
  • We then call .replace() to turn all the commas to at signs
  • 然后我们调用.replace()将所有逗号转换为符号
  • Finally, we put back the first and last commas to form the resulting string
  • 最后,我们放回第一个和最后一个逗号来形成结果字符串