熊猫,DataFrame:将一列分成多个列

时间:2021-03-13 22:34:18

I have the following DataFrame. I am wondering whether it is possible to break the "data" column into multiple columns. E.g., from this:

我有下面的DataFrame。我想知道是否有可能将“data”列分解为多个列。例如,从这个:

ID       Date       data
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8
6       21/01/2014  B: 5, C: 5, D: 7
6       02/04/2013  A: 4, D:7
7       05/06/2014  C: 25
7       12/08/2014  D: 20
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4  

into this:

到这个:

ID       Date       data                            A   B   C   D   E   F
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8    15  8   5   5   0   0
6       21/01/2014  B: 5, C: 5, D: 7                0   5   5   7   0   0     
6       02/04/2013  B: 4, D: 7, B: 6                0   10  0   7   0   0
7       05/06/2014  C: 25                           0   0   25  0   0   0
7       12/08/2014  D: 20                           0   0   0   20  0   0   
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4    2   7   3   0   5   0
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4    0   8   0   6   0   11

I have tried this pandas split string into columns, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.

我试过把这个熊猫分成几列,这个熊猫:我怎么把一列中的文字分成多列?但他们并没有在我的案例中发挥作用。

EDIT

编辑

There is a bit of complexity the "data" column has duplicate values for example in first row "A" is repeated, and therefore these values are summed up under the "A" column (please see second table).

有一点复杂,“data”列有重复的值,例如第一行“a”重复,因此这些值在“a”列下进行汇总(请参见第二表)。

2 个解决方案

#1


6  

Here is a function that can convert the string to a dictionary and aggregate values based on the key; After the conversion it will be easy to get the results with the pd.Series method:

这里有一个函数,可以根据键将字符串转换为字典和聚合值;转换后,很容易得到结果与pd。系列的方法:

def str_to_dict(str1):
    import re
    from collections import defaultdict
    d = defaultdict(int)
    for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
        d[k] += int(v)
    return d

pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)

熊猫,DataFrame:将一列分成多个列

#2


3  

df = pd.DataFrame([
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
    ], columns=['ID', 'dictionary'])

def str2dict(s):
    split = s.strip().split(',')
    d = {}
    for pair in split:
        k, v = [_.strip() for _ in pair.split(':')]
        d[k] = v
    return d

df.dictionary.apply(str2dict).apply(pd.Series)

熊猫,DataFrame:将一列分成多个列

Or:

或者:

pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)

熊猫,DataFrame:将一列分成多个列

#1


6  

Here is a function that can convert the string to a dictionary and aggregate values based on the key; After the conversion it will be easy to get the results with the pd.Series method:

这里有一个函数,可以根据键将字符串转换为字典和聚合值;转换后,很容易得到结果与pd。系列的方法:

def str_to_dict(str1):
    import re
    from collections import defaultdict
    d = defaultdict(int)
    for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
        d[k] += int(v)
    return d

pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)

熊猫,DataFrame:将一列分成多个列

#2


3  

df = pd.DataFrame([
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
    ], columns=['ID', 'dictionary'])

def str2dict(s):
    split = s.strip().split(',')
    d = {}
    for pair in split:
        k, v = [_.strip() for _ in pair.split(':')]
        d[k] = v
    return d

df.dictionary.apply(str2dict).apply(pd.Series)

熊猫,DataFrame:将一列分成多个列

Or:

或者:

pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)

熊猫,DataFrame:将一列分成多个列