I have the output of a command in tabular form. I'm parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I'm using regular expressions to match 1 or more spaces and split it. However, a space is being inserted between every element:
我有表格形式的命令的输出。我正在解析结果文件中的输出并将其存储在一个字符串中。一行中的每个元素由一个或多个空白字符分隔,因此我使用正则表达式匹配1个或多个空格,并将其拆分。但是,在每个元素之间插入了一个空格:
>>> str1="a b c d" # spaces are irregular
>>> str1
'a b c d'
>>> str2=re.split("( )+", str1)
>>> str2
['a', ' ', 'b', ' ', 'c', ' ', 'd'] # 1 space element between!!!
Is there a better way to do this?
有更好的方法吗?
After each split str2
is appended to a list.
在每个分割后的str2被追加到一个列表中。
4 个解决方案
#1
106
By using (
,)
, you are capturing the group, if you simply remove them you will not have this problem.
通过使用(,),您正在捕获组,如果您简单地删除它们,您就不会有这个问题。
>>> str1 = "a b c d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']
However there is no need for regex, str.split
without any delimiter specified will split this by whitespace for you. This would be the best way in this case.
但是,不需要regex、string .split而不指定任何分隔符,将为您将其按空格分隔。在这种情况下,这是最好的办法。
>>> str1.split()
['a', 'b', 'c', 'd']
If you really wanted regex you can use this ('\s'
represents whitespace and it's clearer):
如果你真的想要regex,你可以使用这个('\s'代表空格,它更清晰):
>>> re.split("\s+", str1)
['a', 'b', 'c', 'd']
or you can find all non-whitespace characters
或者您可以找到所有非空白字符。
>>> re.findall(r'\S+',str1)
['a', 'b', 'c', 'd']
#2
13
The str.split
method will automatically remove all white space between items:
split方法将自动删除项目之间的所有空白:
>>> str1 = "a b c d"
>>> str1.split()
['a', 'b', 'c', 'd']
Docs are here: http://docs.python.org/library/stdtypes.html#str.split
文档在这里:http://docs.python.org/library/stdtypes.html # str.split
#3
6
When you use re.split
and the split pattern contains capturing groups, the groups are retained in the output. If you don't want this, use a non-capturing group instead.
当您使用re.split和split模式包含捕获组时,组将保留在输出中。如果你不想这样,可以使用非捕获组。
#4
1
Its very simple actually. Try this:
它其实非常简单。试试这个:
str1="a b c d"
splitStr1 = str1.split()
print splitStr1
#1
106
By using (
,)
, you are capturing the group, if you simply remove them you will not have this problem.
通过使用(,),您正在捕获组,如果您简单地删除它们,您就不会有这个问题。
>>> str1 = "a b c d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']
However there is no need for regex, str.split
without any delimiter specified will split this by whitespace for you. This would be the best way in this case.
但是,不需要regex、string .split而不指定任何分隔符,将为您将其按空格分隔。在这种情况下,这是最好的办法。
>>> str1.split()
['a', 'b', 'c', 'd']
If you really wanted regex you can use this ('\s'
represents whitespace and it's clearer):
如果你真的想要regex,你可以使用这个('\s'代表空格,它更清晰):
>>> re.split("\s+", str1)
['a', 'b', 'c', 'd']
or you can find all non-whitespace characters
或者您可以找到所有非空白字符。
>>> re.findall(r'\S+',str1)
['a', 'b', 'c', 'd']
#2
13
The str.split
method will automatically remove all white space between items:
split方法将自动删除项目之间的所有空白:
>>> str1 = "a b c d"
>>> str1.split()
['a', 'b', 'c', 'd']
Docs are here: http://docs.python.org/library/stdtypes.html#str.split
文档在这里:http://docs.python.org/library/stdtypes.html # str.split
#3
6
When you use re.split
and the split pattern contains capturing groups, the groups are retained in the output. If you don't want this, use a non-capturing group instead.
当您使用re.split和split模式包含捕获组时,组将保留在输出中。如果你不想这样,可以使用非捕获组。
#4
1
Its very simple actually. Try this:
它其实非常简单。试试这个:
str1="a b c d"
splitStr1 = str1.split()
print splitStr1