I have a series made of lists
我有一系列的清单
import pandas as pd
s = pd.Series([[1, 2, 3], [4, 5, 6]])
and I want a DataFrame with each column a list.
我想要一个DataFrame,每列都有一个列表。
None of from_items
, from_records
, DataFrame
Series.to_frame
seem to work.
from_items,from_records,DataFrame Series.to_frame似乎都不起作用。
How to do this?
这个怎么做?
5 个解决方案
#1
9
You can use from_items
like this (assuming that your lists are of the same length):
你可以像这样使用from_items(假设你的列表长度相同):
pd.DataFrame.from_items(zip(s.index, s.values))
0 1
0 1 4
1 2 5
2 3 6
or
pd.DataFrame.from_items(zip(s.index, s.values)).T
0 1 2
0 1 2 3
1 4 5 6
depending on your desired output.
取决于您想要的输出。
This can be much faster than using an apply
(as used in @Wen's answer which, however, does also work for lists of different length):
这可能比使用apply快得多(在@Wen的答案中使用,但是,它也适用于不同长度的列表):
%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 µs per loop
%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop
and
%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 µs per loop
%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop
Also @Hatshepsut's answer is quite fast (also works for lists of different length):
另外@Hatshepsut的答案非常快(也适用于不同长度的列表):
%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 µs per loop
and
%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 µs per loop
Fastest solution seems to be @Abdou's answer (tested for Python 2; also works for lists of different length; use itertools.zip_longest
in Python 3.6+):
最快的解决方案似乎是@Abdou的答案(针对Python 2测试;也适用于不同长度的列表;在Python 3.6+中使用itertools.zip_longest):
%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 µs per loop
An additional option:
另外一个选择:
pd.DataFrame(dict(zip(s.index, s.values)))
0 1
0 1 4
1 2 5
2 3 6
#2
3
pd.DataFrame.from_records
should also work using itertools.zip_longest
:
pd.DataFrame.from_records也应该使用itertools.zip_longest:
from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(*s.values))
# 0 1
# 0 1 4
# 1 2 5
# 2 3 6
#3
1
You may looking for
你可能会寻找
s.apply(lambda x:pd.Series(x))
0 1 2
0 1 2 3
1 4 5 6
Or
s.apply(lambda x:pd.Series(x)).T
Out[133]:
0 1
0 1 4
1 2 5
2 3 6
#4
1
Iterate over the series like this:
像这样迭代这个系列:
series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)
0 1 2
0 1 2 3
1 4 5 6
#5
1
If the length of the series is super high (more than 1m), you can use:
如果系列的长度超高(超过1米),您可以使用:
s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())
#1
9
You can use from_items
like this (assuming that your lists are of the same length):
你可以像这样使用from_items(假设你的列表长度相同):
pd.DataFrame.from_items(zip(s.index, s.values))
0 1
0 1 4
1 2 5
2 3 6
or
pd.DataFrame.from_items(zip(s.index, s.values)).T
0 1 2
0 1 2 3
1 4 5 6
depending on your desired output.
取决于您想要的输出。
This can be much faster than using an apply
(as used in @Wen's answer which, however, does also work for lists of different length):
这可能比使用apply快得多(在@Wen的答案中使用,但是,它也适用于不同长度的列表):
%timeit pd.DataFrame.from_items(zip(s.index, s.values))
1000 loops, best of 3: 669 µs per loop
%timeit s.apply(lambda x:pd.Series(x)).T
1000 loops, best of 3: 1.37 ms per loop
and
%timeit pd.DataFrame.from_items(zip(s.index, s.values)).T
1000 loops, best of 3: 919 µs per loop
%timeit s.apply(lambda x:pd.Series(x))
1000 loops, best of 3: 1.26 ms per loop
Also @Hatshepsut's answer is quite fast (also works for lists of different length):
另外@Hatshepsut的答案非常快(也适用于不同长度的列表):
%timeit pd.DataFrame(item for item in s)
1000 loops, best of 3: 636 µs per loop
and
%timeit pd.DataFrame(item for item in s).T
1000 loops, best of 3: 884 µs per loop
Fastest solution seems to be @Abdou's answer (tested for Python 2; also works for lists of different length; use itertools.zip_longest
in Python 3.6+):
最快的解决方案似乎是@Abdou的答案(针对Python 2测试;也适用于不同长度的列表;在Python 3.6+中使用itertools.zip_longest):
%timeit pd.DataFrame.from_records(izip_longest(*s.values))
1000 loops, best of 3: 529 µs per loop
An additional option:
另外一个选择:
pd.DataFrame(dict(zip(s.index, s.values)))
0 1
0 1 4
1 2 5
2 3 6
#2
3
pd.DataFrame.from_records
should also work using itertools.zip_longest
:
pd.DataFrame.from_records也应该使用itertools.zip_longest:
from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(*s.values))
# 0 1
# 0 1 4
# 1 2 5
# 2 3 6
#3
1
You may looking for
你可能会寻找
s.apply(lambda x:pd.Series(x))
0 1 2
0 1 2 3
1 4 5 6
Or
s.apply(lambda x:pd.Series(x)).T
Out[133]:
0 1
0 1 4
1 2 5
2 3 6
#4
1
Iterate over the series like this:
像这样迭代这个系列:
series = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(item for item in series)
0 1 2
0 1 2 3
1 4 5 6
#5
1
If the length of the series is super high (more than 1m), you can use:
如果系列的长度超高(超过1米),您可以使用:
s = pd.Series([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(s.tolist())