将熊猫的列值转换为行。

时间:2021-04-24 07:38:22

I am trying to convert a dataframe to long form.

我试着把一个dataframe转换成long格式。

The dataframe I am starting with:

我开始的dataframe:

df = pd.DataFrame([['a', 'b'],
                   ['d', 'e'], 
                   ['f', 'g', 'h'],
                   ['q', 'r', 'e', 't']])
df = df.rename(columns={0: "Key"})

    Key 1   2   3
0   a   b   None    None
1   d   e   None    None
2   f   g   h       None
3   q   r   e       t

The number of columns is not specified, there may be more than 4. There should be a new row for each value after the key

没有指定列数,可能有4个以上。在键之后,每个值应该有一个新的行。

This gets what I need, however, it seems there should be a way to do this without having to drop null values:

这就得到了我所需要的,然而,似乎应该有一种方法可以做到这一点,而不必放弃null值:

new_df = pd.melt(df, id_vars=['Key'])[['Key', 'value']]
new_df = new_df.dropna()


    Key value
0   a   b
1   d   e
2   f   g
3   q   r
6   f   h
7   q   e
11  q   t​

3 个解决方案

#1


5  

Option 1
You should be able to do this with set_index + stack:

选项1您应该能够使用set_index +堆栈来实现这一点:

df.set_index('Key').stack().reset_index(level=0, name='value').reset_index(drop=True)

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

If you don't want to keep resetting the index, then use an intermediate variable and create a new DataFrame:

如果您不想继续重新设置索引,那么使用中间变量并创建一个新的DataFrame:

v = df.set_index('Key').stack()
pd.DataFrame({'Key' : v.index.get_level_values(0), 'value' : v.values})

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

The essence here is that stack automatically gets rid of NaNs by default (you can disable that by setting dropna=False).

这里的本质是,默认情况下栈会自动清除NaNs(您可以通过设置dropna=False来禁用它)。


Option 2
More performance with np.repeat and numpy's version of pd.DataFrame.stack:

选项2更多的性能与np。重复和numpy的版本的pd.DataFrame.stack:

i = df.pop('Key').values
j = df.values.ravel()

pd.DataFrame({'Key' : v.repeat(df.count(axis=1)), 'value' : j[pd.notnull(j)]
})

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

#2


5  

By using melt(I do not think dropna create more 'trouble' here)

使用熔体(我不认为dropna会在这里制造更多的麻烦)

df.melt('Key').dropna().drop('variable',1)
Out[809]: 
   Key value
0    a     b
1    d     e
2    f     g
3    q     r
6    f     h
7    q     s
11   q     t

And if without dropna

如果没有dropna

s=df.fillna('').set_index('Key').sum(1).apply(list)
pd.DataFrame({'Key': s.reindex(s.index.repeat(s.str.len())).index,'value':s.sum()})


Out[862]: 
  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

#3


2  

With a comprehension
This assumes the key is the first element of the row.

有了理解,这就假定键是行的第一个元素。

pd.DataFrame(
    [[k, v] for k, *r in df.values for v in r if pd.notna(v)],
    columns=['Key', 'value']
)

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

#1


5  

Option 1
You should be able to do this with set_index + stack:

选项1您应该能够使用set_index +堆栈来实现这一点:

df.set_index('Key').stack().reset_index(level=0, name='value').reset_index(drop=True)

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

If you don't want to keep resetting the index, then use an intermediate variable and create a new DataFrame:

如果您不想继续重新设置索引,那么使用中间变量并创建一个新的DataFrame:

v = df.set_index('Key').stack()
pd.DataFrame({'Key' : v.index.get_level_values(0), 'value' : v.values})

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

The essence here is that stack automatically gets rid of NaNs by default (you can disable that by setting dropna=False).

这里的本质是,默认情况下栈会自动清除NaNs(您可以通过设置dropna=False来禁用它)。


Option 2
More performance with np.repeat and numpy's version of pd.DataFrame.stack:

选项2更多的性能与np。重复和numpy的版本的pd.DataFrame.stack:

i = df.pop('Key').values
j = df.values.ravel()

pd.DataFrame({'Key' : v.repeat(df.count(axis=1)), 'value' : j[pd.notnull(j)]
})

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

#2


5  

By using melt(I do not think dropna create more 'trouble' here)

使用熔体(我不认为dropna会在这里制造更多的麻烦)

df.melt('Key').dropna().drop('variable',1)
Out[809]: 
   Key value
0    a     b
1    d     e
2    f     g
3    q     r
6    f     h
7    q     s
11   q     t

And if without dropna

如果没有dropna

s=df.fillna('').set_index('Key').sum(1).apply(list)
pd.DataFrame({'Key': s.reindex(s.index.repeat(s.str.len())).index,'value':s.sum()})


Out[862]: 
  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

#3


2  

With a comprehension
This assumes the key is the first element of the row.

有了理解,这就假定键是行的第一个元素。

pd.DataFrame(
    [[k, v] for k, *r in df.values for v in r if pd.notna(v)],
    columns=['Key', 'value']
)

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t