I would like to split each row into new columns based on several indices:
我想根据几个索引将每一行拆分成新列:
6ABCDE0218594STRING
to
6 ABCDE 021 8594 STRING
This seems like it'd have been asked at least once before, but I keep finding only variations on the question (separating by a delimiter as in pandas: How do I split text in a column into multiple rows?, separating into new rows using rather than new columns, again with a delimiter: Split pandas dataframe string entry to separate rows).
这似乎至少曾经被问过一次,但我一直只能找到问题的变体(用大熊猫分隔分隔符:如何将列中的文本拆分成多行?,使用分隔成新行而不是新列,再次使用分隔符:将pandas数据帧字符串条目拆分为单独的行)。
I apologize in advance if this is a duplicate!
如果这是重复的话我会提前道歉!
3 个解决方案
#1
2
One way is to use a regex and str.extract to pull out the columns:
一种方法是使用正则表达式和str.extract来拉出列:
In [11]: df = pd.DataFrame([['6ABCDE0218594STRING']])
You could just do it with index, so something like this:
你可以用索引来做,所以像这样:
In [12]: df[0].str.extract('(.)(.{5})(.{3})(.{4})(.*)')
Out[12]:
0 1 2 3 4
0 6 ABCDE 021 8594 STRING
Or you could be a bit more cautious and ensure each column is the correct form:
或者您可能会更谨慎并确保每列都是正确的形式:
In [13]: df[0].str.extract('(\d)(.{5})(\d{3})(\d{4})(.*)')
Out[13]:
0 1 2 3 4
0 6 ABCDE 021 8594 STRING
Note: You can also use named groups (see the docs).
注意:您还可以使用命名组(请参阅文档)。
#2
0
Try this:
string = '6ABCDE0218594STRING'
indices = [1,5,3,4]
myList = []
for index in indices:
token, string = string[:index],string[index:]
myList.append(token)
myList.append(string)
>>> Output: ['6', 'ABCDE', '021', '8594', 'STRING']
#3
0
Or in case you don't know the number of digits, letters etc.:
或者如果您不知道数字,字母等数量:
import re
m = re.match('(\d*)([A-Z]*)(\d*)([A-Z]*)', '6ABCDE0218594STRING').groups()
print m[0], m[1], m[2], m[3]
Output:
6 ABCDE 0218594 STRING
#1
2
One way is to use a regex and str.extract to pull out the columns:
一种方法是使用正则表达式和str.extract来拉出列:
In [11]: df = pd.DataFrame([['6ABCDE0218594STRING']])
You could just do it with index, so something like this:
你可以用索引来做,所以像这样:
In [12]: df[0].str.extract('(.)(.{5})(.{3})(.{4})(.*)')
Out[12]:
0 1 2 3 4
0 6 ABCDE 021 8594 STRING
Or you could be a bit more cautious and ensure each column is the correct form:
或者您可能会更谨慎并确保每列都是正确的形式:
In [13]: df[0].str.extract('(\d)(.{5})(\d{3})(\d{4})(.*)')
Out[13]:
0 1 2 3 4
0 6 ABCDE 021 8594 STRING
Note: You can also use named groups (see the docs).
注意:您还可以使用命名组(请参阅文档)。
#2
0
Try this:
string = '6ABCDE0218594STRING'
indices = [1,5,3,4]
myList = []
for index in indices:
token, string = string[:index],string[index:]
myList.append(token)
myList.append(string)
>>> Output: ['6', 'ABCDE', '021', '8594', 'STRING']
#3
0
Or in case you don't know the number of digits, letters etc.:
或者如果您不知道数字,字母等数量:
import re
m = re.match('(\d*)([A-Z]*)(\d*)([A-Z]*)', '6ABCDE0218594STRING').groups()
print m[0], m[1], m[2], m[3]
Output:
6 ABCDE 0218594 STRING