I have the following DataFrame. I am wondering whether it is possible to break the "data" column into multiple columns. E.g., from this:
我有下面的DataFrame。我想知道是否有可能将“data”列分解为多个列。例如,从这个:
ID Date data 6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 6 21/01/2014 B: 5, C: 5, D: 7 6 02/04/2013 A: 4, D:7 7 05/06/2014 C: 25 7 12/08/2014 D: 20 8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4
into this:
到这个:
ID Date data A B C D E F 6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 15 8 5 5 0 0 6 21/01/2014 B: 5, C: 5, D: 7 0 5 5 7 0 0 6 02/04/2013 B: 4, D: 7, B: 6 0 10 0 7 0 0 7 05/06/2014 C: 25 0 0 25 0 0 0 7 12/08/2014 D: 20 0 0 0 20 0 0 8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 2 7 3 0 5 0 8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4 0 8 0 6 0 11
I have tried this pandas split string into columns, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.
我试过把这个熊猫分成几列,这个熊猫:我怎么把一列中的文字分成多列?但他们并没有在我的案例中发挥作用。
EDIT
编辑
There is a bit of complexity the "data" column has duplicate values for example in first row "A" is repeated, and therefore these values are summed up under the "A" column (please see second table).
有一点复杂,“data”列有重复的值,例如第一行“a”重复,因此这些值在“a”列下进行汇总(请参见第二表)。
2 个解决方案
#1
6
Here is a function that can convert the string to a dictionary and aggregate values based on the key; After the conversion it will be easy to get the results with the pd.Series
method:
这里有一个函数,可以根据键将字符串转换为字典和聚合值;转换后,很容易得到结果与pd。系列的方法:
def str_to_dict(str1):
import re
from collections import defaultdict
d = defaultdict(int)
for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
d[k] += int(v)
return d
pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)
#2
3
df = pd.DataFrame([
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
], columns=['ID', 'dictionary'])
def str2dict(s):
split = s.strip().split(',')
d = {}
for pair in split:
k, v = [_.strip() for _ in pair.split(':')]
d[k] = v
return d
df.dictionary.apply(str2dict).apply(pd.Series)
Or:
或者:
pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)
#1
6
Here is a function that can convert the string to a dictionary and aggregate values based on the key; After the conversion it will be easy to get the results with the pd.Series
method:
这里有一个函数,可以根据键将字符串转换为字典和聚合值;转换后,很容易得到结果与pd。系列的方法:
def str_to_dict(str1):
import re
from collections import defaultdict
d = defaultdict(int)
for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
d[k] += int(v)
return d
pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)
#2
3
df = pd.DataFrame([
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
], columns=['ID', 'dictionary'])
def str2dict(s):
split = s.strip().split(',')
d = {}
for pair in split:
k, v = [_.strip() for _ in pair.split(':')]
d[k] = v
return d
df.dictionary.apply(str2dict).apply(pd.Series)
Or:
或者:
pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)