Python:分割字符串,尊重和保留引号[重复]

时间:2021-08-30 02:43:08

This question already has an answer here:

这个问题已经有了答案:

Using python, I want to split the following string:

使用python,我想拆分如下字符串:

a=foo, b=bar, c="foo, bar", d=false, e="false"

This should result in the following list:

这应导致下列清单:

['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']

When using shlex in posix-mode and splitting with ", ", the argument for cgets treated correctly. However, it removes the quotes. I need them because false is not the same as "false", for instance.

当在posix模式下使用shlex并使用“,”拆分时,clex的参数得到了正确的处理。但是,它删除了引号。我需要它们,因为false和false不一样。

My code so far:

到目前为止我的代码:

import shlex

mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'

splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']

2 个解决方案

#1


19  

>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
  1. The regex pattern "[^"]*" matches a simple quoted string.
  2. 正则表达式模式”[^]*”一个简单的引用字符串匹配。
  3. "(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
  4. “(?:\ \ |。[^])*”匹配引用字符串和跳过逃脱了引号,因为\ \。使用两个字符:反斜杠和任何字符。
  5. [^\s,"] matches a non-delimiter.
  6. []匹配non-delimiter ^ \ s。”
  7. Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.
  8. 将模式2和3组合在一起(?: |)+匹配一个非分隔符和引号字符串序列,这是期望的结果。

#2


0  

Regex can solve this easily enough:

Regex可以很容易地解决这个问题:

import re

mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'

splitString = re.split(',?\s(?=\w+=)',mystring)

The regex pattern here looks for a whitespace followed by a word character and then an equals sign which splits your string as you desire and maintains any quotes.

这里的regex模式寻找空格,后面跟着一个单词字符,然后是一个等号,这个等号可以按照您的意愿分割字符串并保留任何引号。

#1


19  

>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
  1. The regex pattern "[^"]*" matches a simple quoted string.
  2. 正则表达式模式”[^]*”一个简单的引用字符串匹配。
  3. "(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
  4. “(?:\ \ |。[^])*”匹配引用字符串和跳过逃脱了引号,因为\ \。使用两个字符:反斜杠和任何字符。
  5. [^\s,"] matches a non-delimiter.
  6. []匹配non-delimiter ^ \ s。”
  7. Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.
  8. 将模式2和3组合在一起(?: |)+匹配一个非分隔符和引号字符串序列,这是期望的结果。

#2


0  

Regex can solve this easily enough:

Regex可以很容易地解决这个问题:

import re

mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'

splitString = re.split(',?\s(?=\w+=)',mystring)

The regex pattern here looks for a whitespace followed by a word character and then an equals sign which splits your string as you desire and maintains any quotes.

这里的regex模式寻找空格,后面跟着一个单词字符,然后是一个等号,这个等号可以按照您的意愿分割字符串并保留任何引号。