I would like to parse a string like this:
我想解析这样的字符串:
-o 1 --long "Some long string"
into this:
进入这个:
["-o", "1", "--long", 'Some long string']
or similar.
或类似的。
This is different than either getopt, or optparse, which start with sys.argv parsed input (like the output I have above). Is there a standard way to do this? Basically, this is "splitting" while keeping quoted strings together.
这与getopt或optparse不同,后者以sys.argv解析的输入开头(就像我上面的输出一样)。有没有标准的方法来做到这一点?基本上,这是“分裂”,同时保持引用的字符串在一起。
My best function so far:
到目前为止我的最佳功能:
import csv
def split_quote(string,quotechar='"'):
'''
>>> split_quote('--blah "Some argument" here')
['--blah', 'Some argument', 'here']
>>> split_quote("--blah 'Some argument' here", quotechar="'")
['--blah', 'Some argument', 'here']
'''
s = csv.StringIO(string)
C = csv.reader(s, delimiter=" ",quotechar=quotechar)
return list(C)[0]
2 个解决方案
#1
71
I believe you want the shlex module.
我相信你想要shlex模块。
>>> import shlex
>>> shlex.split('-o 1 --long "Some long string"')
['-o', '1', '--long', 'Some long string']
#2
0
Before I was aware of shlex.split
, I made the following:
在我了解shlex.split之前,我做了以下内容:
import sys
_WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))
_QUOTE_CHARS_DICT = {
'\\': '\\',
' ': ' ',
'"': '"',
'r': '\r',
'n': '\n',
't': '\t',
}
def _raise_type_error():
raise TypeError("Bytes must be decoded to Unicode first")
def parse_to_argv_gen(instring):
is_in_quotes = False
instring_iter = iter(instring)
join_string = instring[0:0]
c_list = []
c = ' '
while True:
# Skip whitespace
try:
while True:
if not isinstance(c, str) and sys.version_info[0] >= 3:
_raise_type_error()
if c not in _WORD_DIVIDERS:
break
c = next(instring_iter)
except StopIteration:
break
# Read word
try:
while True:
if not isinstance(c, str) and sys.version_info[0] >= 3:
_raise_type_error()
if not is_in_quotes and c in _WORD_DIVIDERS:
break
if c == '"':
is_in_quotes = not is_in_quotes
c = None
elif c == '\\':
c = next(instring_iter)
c = _QUOTE_CHARS_DICT.get(c)
if c is not None:
c_list.append(c)
c = next(instring_iter)
yield join_string.join(c_list)
c_list = []
except StopIteration:
yield join_string.join(c_list)
break
def parse_to_argv(instring):
return list(parse_to_argv_gen(instring))
This works with Python 2.x and 3.x. On Python 2.x, it works directly with byte strings and Unicode strings. On Python 3.x, it only accepts [Unicode] strings, not bytes
objects.
这适用于Python 2.x和3.x.在Python 2.x上,它直接使用字节字符串和Unicode字符串。在Python 3.x上,它只接受[Unicode]字符串,而不接受字节对象。
This doesn't behave exactly the same as shell argv splitting—it also allows quoting of CR, LF and TAB characters as \r
, \n
and \t
, converting them to real CR, LF, TAB (shlex.split
doesn't do that). So writing my own function was useful for my needs. I guess shlex.split
is better if you just want plain shell-style argv splitting. I'm sharing this code in case it's useful as a baseline for doing something slightly different.
这与shell argv拆分的行为完全不同 - 它还允许引用CR,LF和TAB字符为\ r,\ n和\ t,将它们转换为实际的CR,LF,TAB(shlex.split不会去做)。所以编写我自己的函数对我的需求很有用。我想shlex.split更好,如果你只是想要简单的shell样式的argv拆分。我正在分享这段代码,以防它作为执行稍微不同的事情的基线。
#1
71
I believe you want the shlex module.
我相信你想要shlex模块。
>>> import shlex
>>> shlex.split('-o 1 --long "Some long string"')
['-o', '1', '--long', 'Some long string']
#2
0
Before I was aware of shlex.split
, I made the following:
在我了解shlex.split之前,我做了以下内容:
import sys
_WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))
_QUOTE_CHARS_DICT = {
'\\': '\\',
' ': ' ',
'"': '"',
'r': '\r',
'n': '\n',
't': '\t',
}
def _raise_type_error():
raise TypeError("Bytes must be decoded to Unicode first")
def parse_to_argv_gen(instring):
is_in_quotes = False
instring_iter = iter(instring)
join_string = instring[0:0]
c_list = []
c = ' '
while True:
# Skip whitespace
try:
while True:
if not isinstance(c, str) and sys.version_info[0] >= 3:
_raise_type_error()
if c not in _WORD_DIVIDERS:
break
c = next(instring_iter)
except StopIteration:
break
# Read word
try:
while True:
if not isinstance(c, str) and sys.version_info[0] >= 3:
_raise_type_error()
if not is_in_quotes and c in _WORD_DIVIDERS:
break
if c == '"':
is_in_quotes = not is_in_quotes
c = None
elif c == '\\':
c = next(instring_iter)
c = _QUOTE_CHARS_DICT.get(c)
if c is not None:
c_list.append(c)
c = next(instring_iter)
yield join_string.join(c_list)
c_list = []
except StopIteration:
yield join_string.join(c_list)
break
def parse_to_argv(instring):
return list(parse_to_argv_gen(instring))
This works with Python 2.x and 3.x. On Python 2.x, it works directly with byte strings and Unicode strings. On Python 3.x, it only accepts [Unicode] strings, not bytes
objects.
这适用于Python 2.x和3.x.在Python 2.x上,它直接使用字节字符串和Unicode字符串。在Python 3.x上,它只接受[Unicode]字符串,而不接受字节对象。
This doesn't behave exactly the same as shell argv splitting—it also allows quoting of CR, LF and TAB characters as \r
, \n
and \t
, converting them to real CR, LF, TAB (shlex.split
doesn't do that). So writing my own function was useful for my needs. I guess shlex.split
is better if you just want plain shell-style argv splitting. I'm sharing this code in case it's useful as a baseline for doing something slightly different.
这与shell argv拆分的行为完全不同 - 它还允许引用CR,LF和TAB字符为\ r,\ n和\ t,将它们转换为实际的CR,LF,TAB(shlex.split不会去做)。所以编写我自己的函数对我的需求很有用。我想shlex.split更好,如果你只是想要简单的shell样式的argv拆分。我正在分享这段代码,以防它作为执行稍微不同的事情的基线。