如何在Python中可靠地分割字符串?

时间:2021-08-14 21:36:59

In Perl I can do:

在Perl中,我可以做到:

my ($x, $y) = split /:/, $str;

And it will work whether or not the string contains the pattern.

无论字符串是否包含模式,它都可以工作。

In Python, however this won't work:

但在Python中,这是行不通的:

a, b = "foo".split(":")  # ValueError: not enough values to unpack

What's the canonical way to prevent errors in such cases?

在这种情况下,防止错误的标准方法是什么?

5 个解决方案

#1


107  

If you're splitting into just two parts (like in your example) you can use str.partition() to get a guaranteed argument unpacking size of 3:

如果您将其分为两部分(如示例中所示),您可以使用string .partition()来获得保证的参数,使其解包大小为3:

>>> a, sep, b = "foo".partition(":")
>>> a, sep, b
('foo', '', '')

str.partition() always returns a 3-tuple, whether the separator is found or not.

无论是否找到分隔符,string .partition()始终返回一个3元组。

Another alternative for Python 3 is to use extended unpacking, as described in @cdarke's answer:

Python 3的另一种替代方法是使用扩展解包,如@cdarke的回答所述:

>>> a, *b = "foo".split(":")
>>> a, b
('foo', [])

This assigns the first split item to a and the list of remaining items (if any) to b.

这将把第一个拆分项分配给a,其余项(如果有的话)的列表分配给b。

#2


60  

Since you are on Python 3, it is easy. PEP 3132 introduced a welcome simplification of the syntax when assigning to tuples - Extended iterable unpacking. In the past, if assigning to variables in a tuple, the number of items on the left of the assignment must be exactly equal to that on the right.

因为您使用的是Python 3,所以很容易。PEP 3132在分配到元组扩展的可迭代解包时引入了语法的简化。在过去,如果在元组中分配变量,则任务左边的项的数量必须与右边的值完全相等。

In Python 3 we can designate any variable on the left as a list by prefixing with an asterisk *. That will grab as many values as it can, while still populating the variables to its right (so it need not be the rightmost item). This avoids many nasty slices when we don't know the length of a tuple.

在Python 3中,我们可以用星号*前缀将左边的任何变量指定为列表。这将获取尽可能多的值,同时仍然将变量填充到右边(因此它不必是最右边的项)。当我们不知道一个元组的长度时,这就避免了许多令人讨厌的片段。

a, *b = "foo".split(":")  
print("a:", a, "b:", b)

Gives:

给:

a: foo b: []

EDIT following comments and discussion:

编辑以下评论和讨论:

In comparison to the Perl version, this is considerably different, but it is the Python (3) way. In comparison with the Perl version, re.split() would be more similar, however invoking the RE engine for splitting around a single character is an unnecessary overhead.

与Perl版本相比,这有很大的不同,但它是Python(3)的方式。与Perl版本相比,re.split()可能更类似,但是调用用于分割单个字符的RE引擎是不必要的开销。

With multiple elements in Python:

使用Python中的多个元素:

s = 'hello:world:sailor'
a, *b = s.split(":")
print("a:", a, "b:", b)

gives:

给:

a: hello b: ['world', 'sailor']

However in Perl:

然而在Perl中:

my $s = 'hello:world:sailor';
my ($a, $b) = split /:/, $s;
print "a: $a b: $b\n";

gives:

给:

a: hello b: world

It can be seen that additional elements are ignored, or lost, in Perl. That is fairly easy to replicate in Python if required:

可以看到,在Perl中会忽略或丢失其他元素。如果需要,可以很容易地在Python中进行复制:

s = 'hello:world:sailor'
a, *b = s.split(":")
b = b[0]
print("a:", a, "b:", b)

So, a, *b = s.split(":") equivalent in Perl would be

因此,在Perl中,a, *b = s.split(":")等价就是

my ($a, @b) = split /:/, $s;

NB: we shouldn't use $a and $b in general Perl since they have a special meaning when used with sort. I have used them here for consistency with the Python example.

NB:我们不应该在一般的Perl中使用$a和$b,因为它们与sort一起使用时有特殊的含义。我在这里使用它们是为了与Python示例保持一致。

Python does have an extra trick up its sleeve, we can unpack to any element in the tuple on the left:

Python有一个额外的技巧,我们可以解包到左边tuple中的任何元素:

s = "one:two:three:four"
a, *b, c = s.split(':')
print("a:", a, "b:", b, "c:", c)

Gives:

给:

a: one b: ['two', 'three'] c: four

Whereas in the Perl equivalent, the array (@b) is greedy, and the scalar $c is undef:

而在Perl中,数组(@b)是贪婪的,而标量$c是undef:

use strict;
use warnings;

my $s = 'one:two:three:four';
my ($a, @b, $c) = split /:/, $s;
print "a: $a b: @b c: $c\n";

Gives:

给:

Use of uninitialized value $c in concatenation (.) or string at gash.pl line 8.
a: one b: two three four c: 

#3


22  

You are always free to catch the exception.

您可以随时捕捉异常。

For example:

例如:

some_string = "foo"

try:
    a, b = some_string.split(":")
except ValueError:
    a = some_string
    b = ""

If assigning the whole original string to a and an empty string to b is the desired behaviour, I would probably use str.partition() as eugene y suggests. However, this solution gives you more control over exactly what happens when there is no separator in the string, which might be useful in some cases.

如果将整个原始字符串分配给a,空字符串分配给b,那么我可能会像eugene y所建议的那样使用string .partition()。但是,这个解决方案可以让您更准确地控制字符串中没有分隔符时发生的情况,这在某些情况下可能是有用的。

#4


17  

split will always return a list. a, b = ... will always expect list length to be two. You can use something like l = string.split(':'); a = l[0]; ....

split将始终返回一个列表。a、b =…将总是期望列表长度为2。可以使用l = string.split(':');一个= l[0];....

Here is a one liner: a, b = (string.split(':') + [None]*2)[:2]

这是一行:a, b = (string.split(':') + [None]*2)[:2]

#5


3  

How about using Regular Expressions:

如何使用正则表达式:

import re 
string = 'one:two:three:four'

in 3.X:

在3.倍:

a, *b = re.split(':', string)

in 2.X:

在2.倍:

a, b = re.split(':', string)[0], re.split(':', string)[1:]

This way you can also use regular expressions to split(i. e. \d)

这样,您还可以使用正则表达式拆分(i)。e。\ d)

#1


107  

If you're splitting into just two parts (like in your example) you can use str.partition() to get a guaranteed argument unpacking size of 3:

如果您将其分为两部分(如示例中所示),您可以使用string .partition()来获得保证的参数,使其解包大小为3:

>>> a, sep, b = "foo".partition(":")
>>> a, sep, b
('foo', '', '')

str.partition() always returns a 3-tuple, whether the separator is found or not.

无论是否找到分隔符,string .partition()始终返回一个3元组。

Another alternative for Python 3 is to use extended unpacking, as described in @cdarke's answer:

Python 3的另一种替代方法是使用扩展解包,如@cdarke的回答所述:

>>> a, *b = "foo".split(":")
>>> a, b
('foo', [])

This assigns the first split item to a and the list of remaining items (if any) to b.

这将把第一个拆分项分配给a,其余项(如果有的话)的列表分配给b。

#2


60  

Since you are on Python 3, it is easy. PEP 3132 introduced a welcome simplification of the syntax when assigning to tuples - Extended iterable unpacking. In the past, if assigning to variables in a tuple, the number of items on the left of the assignment must be exactly equal to that on the right.

因为您使用的是Python 3,所以很容易。PEP 3132在分配到元组扩展的可迭代解包时引入了语法的简化。在过去,如果在元组中分配变量,则任务左边的项的数量必须与右边的值完全相等。

In Python 3 we can designate any variable on the left as a list by prefixing with an asterisk *. That will grab as many values as it can, while still populating the variables to its right (so it need not be the rightmost item). This avoids many nasty slices when we don't know the length of a tuple.

在Python 3中,我们可以用星号*前缀将左边的任何变量指定为列表。这将获取尽可能多的值,同时仍然将变量填充到右边(因此它不必是最右边的项)。当我们不知道一个元组的长度时,这就避免了许多令人讨厌的片段。

a, *b = "foo".split(":")  
print("a:", a, "b:", b)

Gives:

给:

a: foo b: []

EDIT following comments and discussion:

编辑以下评论和讨论:

In comparison to the Perl version, this is considerably different, but it is the Python (3) way. In comparison with the Perl version, re.split() would be more similar, however invoking the RE engine for splitting around a single character is an unnecessary overhead.

与Perl版本相比,这有很大的不同,但它是Python(3)的方式。与Perl版本相比,re.split()可能更类似,但是调用用于分割单个字符的RE引擎是不必要的开销。

With multiple elements in Python:

使用Python中的多个元素:

s = 'hello:world:sailor'
a, *b = s.split(":")
print("a:", a, "b:", b)

gives:

给:

a: hello b: ['world', 'sailor']

However in Perl:

然而在Perl中:

my $s = 'hello:world:sailor';
my ($a, $b) = split /:/, $s;
print "a: $a b: $b\n";

gives:

给:

a: hello b: world

It can be seen that additional elements are ignored, or lost, in Perl. That is fairly easy to replicate in Python if required:

可以看到,在Perl中会忽略或丢失其他元素。如果需要,可以很容易地在Python中进行复制:

s = 'hello:world:sailor'
a, *b = s.split(":")
b = b[0]
print("a:", a, "b:", b)

So, a, *b = s.split(":") equivalent in Perl would be

因此,在Perl中,a, *b = s.split(":")等价就是

my ($a, @b) = split /:/, $s;

NB: we shouldn't use $a and $b in general Perl since they have a special meaning when used with sort. I have used them here for consistency with the Python example.

NB:我们不应该在一般的Perl中使用$a和$b,因为它们与sort一起使用时有特殊的含义。我在这里使用它们是为了与Python示例保持一致。

Python does have an extra trick up its sleeve, we can unpack to any element in the tuple on the left:

Python有一个额外的技巧,我们可以解包到左边tuple中的任何元素:

s = "one:two:three:four"
a, *b, c = s.split(':')
print("a:", a, "b:", b, "c:", c)

Gives:

给:

a: one b: ['two', 'three'] c: four

Whereas in the Perl equivalent, the array (@b) is greedy, and the scalar $c is undef:

而在Perl中,数组(@b)是贪婪的,而标量$c是undef:

use strict;
use warnings;

my $s = 'one:two:three:four';
my ($a, @b, $c) = split /:/, $s;
print "a: $a b: @b c: $c\n";

Gives:

给:

Use of uninitialized value $c in concatenation (.) or string at gash.pl line 8.
a: one b: two three four c: 

#3


22  

You are always free to catch the exception.

您可以随时捕捉异常。

For example:

例如:

some_string = "foo"

try:
    a, b = some_string.split(":")
except ValueError:
    a = some_string
    b = ""

If assigning the whole original string to a and an empty string to b is the desired behaviour, I would probably use str.partition() as eugene y suggests. However, this solution gives you more control over exactly what happens when there is no separator in the string, which might be useful in some cases.

如果将整个原始字符串分配给a,空字符串分配给b,那么我可能会像eugene y所建议的那样使用string .partition()。但是,这个解决方案可以让您更准确地控制字符串中没有分隔符时发生的情况,这在某些情况下可能是有用的。

#4


17  

split will always return a list. a, b = ... will always expect list length to be two. You can use something like l = string.split(':'); a = l[0]; ....

split将始终返回一个列表。a、b =…将总是期望列表长度为2。可以使用l = string.split(':');一个= l[0];....

Here is a one liner: a, b = (string.split(':') + [None]*2)[:2]

这是一行:a, b = (string.split(':') + [None]*2)[:2]

#5


3  

How about using Regular Expressions:

如何使用正则表达式:

import re 
string = 'one:two:three:four'

in 3.X:

在3.倍:

a, *b = re.split(':', string)

in 2.X:

在2.倍:

a, b = re.split(':', string)[0], re.split(':', string)[1:]

This way you can also use regular expressions to split(i. e. \d)

这样,您还可以使用正则表达式拆分(i)。e。\ d)