如何将带有值列表的列转换为熊猫数据存储器中的行

时间:2022-09-02 00:21:20

Hi I have a dataframe like this:

你好,我有一个这样的数据aframe:

    A             B 
0:  some value    [[L1, L2]]

I want to change it into:

我想把它变成:

    A             B 
0:  some value    L1
1:  some value    L2

How can I do that?

我怎么做呢?

3 个解决方案

#1


12  

you can do it this way:

你可以这样做:

In [84]: df
Out[84]:
               A               B
0     some value      [[L1, L2]]
1  another value  [[L3, L4, L5]]

In [85]: (df['B'].apply(lambda x: pd.Series(x[0]))
   ....:         .stack()
   ....:         .reset_index(level=1, drop=True)
   ....:         .to_frame('B')
   ....:         .join(df[['A']], how='left')
   ....: )
Out[85]:
    B              A
0  L1     some value
0  L2     some value
1  L3  another value
1  L4  another value
1  L5  another value

UPDATE: a more generic solution

更新:更通用的解决方案

#2


3  

Faster solution with chain.from_iterable and numpy.repeat:

使用chain.from_iterable和numpi的更快的解决方案。

df = pd.DataFrame({'A':['a','b'],
                   'B':[[['A1', 'A2']],[['A1', 'A2', 'A3']]]})

print (df)
   A               B
0  a      [[A1, A2]]
1  b  [[A1, A2, A3]]


df1 = pd.DataFrame({ "A": np.repeat(df.A.values, 
                                    [len(x) for x in (chain.from_iterable(df.B))]),
                     "B": list(chain.from_iterable(chain.from_iterable(df.B)))})

print (df1)
   A   B
0  a  A1
1  a  A2
2  b  A1
3  b  A2
4  b  A3

Timings:

计时:

A = np.unique(np.random.randint(0, 1000, 1000))
B = [[list(string.ascii_letters[:random.randint(3, 10)])] for _ in range(len(A))]
df = pd.DataFrame({"A":A, "B":B})
print (df)
       A                                 B
0      0        [[a, b, c, d, e, f, g, h]]
1      1                       [[a, b, c]]
2      3     [[a, b, c, d, e, f, g, h, i]]
3      5                 [[a, b, c, d, e]]
4      6     [[a, b, c, d, e, f, g, h, i]]
5      7           [[a, b, c, d, e, f, g]]
6      8              [[a, b, c, d, e, f]]
7     10              [[a, b, c, d, e, f]]
8     11           [[a, b, c, d, e, f, g]]
9     12     [[a, b, c, d, e, f, g, h, i]]
10    13        [[a, b, c, d, e, f, g, h]]
...
...

In [67]: %timeit pd.DataFrame({ "A": np.repeat(df.A.values, [len(x) for x in (chain.from_iterable(df.B))]),"B": list(chain.from_iterable(chain.from_iterable(df.B)))})
1000 loops, best of 3: 818 µs per loop

In [68]: %timeit ((df['B'].apply(lambda x: pd.Series(x[0])).stack().reset_index(level=1, drop=True).to_frame('B').join(df[['A']], how='left')))
10 loops, best of 3: 103 ms per loop

#3


1  

I can't find a elegant way to handle this, but the following codes can work...

我找不到一个优雅的方法来处理这个问题,但是下面的代码可以工作……

import pandas as pd
import numpy as np
df = pd.DataFrame([{"a":1,"b":[[1,2]]},{"a":4, "b":[[3,4,5]]}])
z = []
for k,row in df.iterrows():
    for j in list(np.array(row.b).flat):
        z.append({'a':row.a, 'b':j})
result = pd.DataFrame(z)

#1


12  

you can do it this way:

你可以这样做:

In [84]: df
Out[84]:
               A               B
0     some value      [[L1, L2]]
1  another value  [[L3, L4, L5]]

In [85]: (df['B'].apply(lambda x: pd.Series(x[0]))
   ....:         .stack()
   ....:         .reset_index(level=1, drop=True)
   ....:         .to_frame('B')
   ....:         .join(df[['A']], how='left')
   ....: )
Out[85]:
    B              A
0  L1     some value
0  L2     some value
1  L3  another value
1  L4  another value
1  L5  another value

UPDATE: a more generic solution

更新:更通用的解决方案

#2


3  

Faster solution with chain.from_iterable and numpy.repeat:

使用chain.from_iterable和numpi的更快的解决方案。

df = pd.DataFrame({'A':['a','b'],
                   'B':[[['A1', 'A2']],[['A1', 'A2', 'A3']]]})

print (df)
   A               B
0  a      [[A1, A2]]
1  b  [[A1, A2, A3]]


df1 = pd.DataFrame({ "A": np.repeat(df.A.values, 
                                    [len(x) for x in (chain.from_iterable(df.B))]),
                     "B": list(chain.from_iterable(chain.from_iterable(df.B)))})

print (df1)
   A   B
0  a  A1
1  a  A2
2  b  A1
3  b  A2
4  b  A3

Timings:

计时:

A = np.unique(np.random.randint(0, 1000, 1000))
B = [[list(string.ascii_letters[:random.randint(3, 10)])] for _ in range(len(A))]
df = pd.DataFrame({"A":A, "B":B})
print (df)
       A                                 B
0      0        [[a, b, c, d, e, f, g, h]]
1      1                       [[a, b, c]]
2      3     [[a, b, c, d, e, f, g, h, i]]
3      5                 [[a, b, c, d, e]]
4      6     [[a, b, c, d, e, f, g, h, i]]
5      7           [[a, b, c, d, e, f, g]]
6      8              [[a, b, c, d, e, f]]
7     10              [[a, b, c, d, e, f]]
8     11           [[a, b, c, d, e, f, g]]
9     12     [[a, b, c, d, e, f, g, h, i]]
10    13        [[a, b, c, d, e, f, g, h]]
...
...

In [67]: %timeit pd.DataFrame({ "A": np.repeat(df.A.values, [len(x) for x in (chain.from_iterable(df.B))]),"B": list(chain.from_iterable(chain.from_iterable(df.B)))})
1000 loops, best of 3: 818 µs per loop

In [68]: %timeit ((df['B'].apply(lambda x: pd.Series(x[0])).stack().reset_index(level=1, drop=True).to_frame('B').join(df[['A']], how='left')))
10 loops, best of 3: 103 ms per loop

#3


1  

I can't find a elegant way to handle this, but the following codes can work...

我找不到一个优雅的方法来处理这个问题,但是下面的代码可以工作……

import pandas as pd
import numpy as np
df = pd.DataFrame([{"a":1,"b":[[1,2]]},{"a":4, "b":[[3,4,5]]}])
z = []
for k,row in df.iterrows():
    for j in list(np.array(row.b).flat):
        z.append({'a':row.a, 'b':j})
result = pd.DataFrame(z)