将pandas系列列表转换为dataframe

时间:2022-06-17 04:31:51

I have a series made of lists

我有一系列的清单

import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])

and I want a DataFrame with each column a list.

我想要一个DataFrame,每列都有一个列表。

None of from_items, from_records, DataFrame Series.to_frame seem to work.

from_items,from_records,DataFrame Series.to_frame似乎都不起作用。

How to do this?

这个怎么做?

5 个解决方案

#1


9  

You can use from_items like this (assuming that your lists are of the same length):

你可以像这样使用from_items(假设你的列表长度相同):

pd.DataFrame.from_items(zip(s.index, s.values))

   0  1
0  1  4
1  2  5
2  3  6

or

pd.DataFrame.from_items(zip(s.index, s.values)).T

   0  1  2
0  1  2  3
1  4  5  6

depending on your desired output.

取决于您想要的输出。

This can be much faster than using an apply (as used in @Wen's answer which, however, does also work for lists of different length):

这可能比使用apply快得多(在@Wen的答案中使用,但是,它也适用于不同长度的列表):

%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 µs per loop

%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop

and

%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 µs per loop

%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop

Also @Hatshepsut's answer is quite fast (also works for lists of different length):

另外@Hatshepsut的答案非常快(也适用于不同长度的列表):

%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 µs per loop

and

%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 µs per loop

Fastest solution seems to be @Abdou's answer (tested for Python 2; also works for lists of different length; use itertools.zip_longest in Python 3.6+):

最快的解决方案似乎是@Abdou的答案(针对Python 2测试;也适用于不同长度的列表;在Python 3.6+中使用itertools.zip_longest):

%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 µs per loop

An additional option:

另外一个选择:

pd.DataFrame(dict(zip(s.index, s.values)))

   0  1
0  1  4
1  2  5
2  3  6

#2


3  

pd.DataFrame.from_records should also work using itertools.zip_longest:

pd.DataFrame.from_records也应该使用itertools.zip_longest:

from itertools import zip_longest

pd.DataFrame.from_records(zip_longest(*s.values))

#    0  1
# 0  1  4
# 1  2  5
# 2  3  6

#3


1  

You may looking for

你可能会寻找

s.apply(lambda x:pd.Series(x))
   0  1  2
0  1  2  3
1  4  5  6

Or

 s.apply(lambda x:pd.Series(x)).T

Out[133]: 
   0  1
0  1  4
1  2  5
2  3  6

#4


1  

Iterate over the series like this:

像这样迭代这个系列:

series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)

   0  1  2
0  1  2  3
1  4  5  6

#5


1  

If the length of the series is super high (more than 1m), you can use:

如果系列的长度超高(超过1米),您可以使用:

s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())

#1


9  

You can use from_items like this (assuming that your lists are of the same length):

你可以像这样使用from_items(假设你的列表长度相同):

pd.DataFrame.from_items(zip(s.index, s.values))

   0  1
0  1  4
1  2  5
2  3  6

or

pd.DataFrame.from_items(zip(s.index, s.values)).T

   0  1  2
0  1  2  3
1  4  5  6

depending on your desired output.

取决于您想要的输出。

This can be much faster than using an apply (as used in @Wen's answer which, however, does also work for lists of different length):

这可能比使用apply快得多(在@Wen的答案中使用,但是,它也适用于不同长度的列表):

%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 µs per loop

%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop

and

%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 µs per loop

%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop

Also @Hatshepsut's answer is quite fast (also works for lists of different length):

另外@Hatshepsut的答案非常快(也适用于不同长度的列表):

%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 µs per loop

and

%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 µs per loop

Fastest solution seems to be @Abdou's answer (tested for Python 2; also works for lists of different length; use itertools.zip_longest in Python 3.6+):

最快的解决方案似乎是@Abdou的答案(针对Python 2测试;也适用于不同长度的列表;在Python 3.6+中使用itertools.zip_longest):

%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 µs per loop

An additional option:

另外一个选择:

pd.DataFrame(dict(zip(s.index, s.values)))

   0  1
0  1  4
1  2  5
2  3  6

#2


3  

pd.DataFrame.from_records should also work using itertools.zip_longest:

pd.DataFrame.from_records也应该使用itertools.zip_longest:

from itertools import zip_longest

pd.DataFrame.from_records(zip_longest(*s.values))

#    0  1
# 0  1  4
# 1  2  5
# 2  3  6

#3


1  

You may looking for

你可能会寻找

s.apply(lambda x:pd.Series(x))
   0  1  2
0  1  2  3
1  4  5  6

Or

 s.apply(lambda x:pd.Series(x)).T

Out[133]: 
   0  1
0  1  4
1  2  5
2  3  6

#4


1  

Iterate over the series like this:

像这样迭代这个系列:

series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)

   0  1  2
0  1  2  3
1  4  5  6

#5


1  

If the length of the series is super high (more than 1m), you can use:

如果系列的长度超高(超过1米),您可以使用:

s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())