I have a Pandas data frame that has 51034 rows and 10 columns. I want to slice this data frame into 158 smaller data frames based on a list that contains the rows to slice.
我有一个熊猫数据帧,它有51034行和10列。我想基于包含要切片的行的列表将这个数据帧分割成158个较小的数据帧。
How is it possible to slice a pandas data frame into smaller data frames?
如何可能将熊猫的数据帧分割成更小的数据帧?
For example, if I have a data frame with 10 rows and 4 columns:
例如,如果我有一个包含10行和4列的数据框架:
A B C D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
4 17 18 19 20
5 21 22 23 24
6 25 26 27 28
7 29 30 31 32
8 33 34 35 36
9 37 38 39 40
This example data frame will be sliced every 2 rows to create 5 new smaller data frames:
本例数据帧每两行被分割成5个新的更小的数据帧:
DataFrame1:
A B C D
0 1 2 3 4
1 5 6 7 8
DataFrame2:
A B C D
0 9 10 11 12
1 13 14 15 16
DataFrame3:
A B C D
0 17 18 19 20
1 21 22 23 24
DataFrame4:
A B C D
0 25 26 27 28
1 29 30 31 32
DataFrame5:
A B C D
0 33 34 35 36
1 37 38 39 40
I am not sure how to use the slice the larger data frame to create the smaller data frames.
我不知道如何使用切片更大的数据帧来创建更小的数据帧。
Any suggestions on how to accomplish this goal?
对于如何实现这一目标有什么建议吗?
Thank you.
谢谢你!
Rodrigo
罗德里戈
1 个解决方案
#1
2
You can use groupby
with a simple index-to-group mapping function, assuming that the index is consecutive and starts from 0:
您可以使用一个简单的indexto -group映射函数来使用groupby,假设该索引是连续的,并且从0开始:
for _, df_k in df.groupby(lambda x: x/2):
print df_k.reset_index(drop=True)
Output:
输出:
A B C D
0 1 2 3 4
1 5 6 7 8
A B C D
0 9 10 11 12
1 13 14 15 16
A B C D
0 17 18 19 20
1 21 22 23 24
A B C D
0 25 26 27 28
1 29 30 31 32
A B C D
0 33 34 35 36
1 37 38 39 40
If you have a list of numbers indicating the slicing positions, you can pass in a dictionary as the group mapping:
如果您有一个数字列表,指示切片位置,您可以传入字典作为组映射:
import numpy as np
slice_at = [3, 5]
group_sizes = np.diff([0] + slice_at + [len(df)])
mapping = dict(zip(df.index, np.repeat(range(len(group_sizes)), group_sizes)))
for _, df_k in df.groupby(mapping):
print df_k.reset_index(drop=True)
Output:
输出:
A B C D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
A B C D
0 13 14 15 16
1 17 18 19 20
A B C D
0 21 22 23 24
1 25 26 27 28
2 29 30 31 32
3 33 34 35 36
4 37 38 39 40
#1
2
You can use groupby
with a simple index-to-group mapping function, assuming that the index is consecutive and starts from 0:
您可以使用一个简单的indexto -group映射函数来使用groupby,假设该索引是连续的,并且从0开始:
for _, df_k in df.groupby(lambda x: x/2):
print df_k.reset_index(drop=True)
Output:
输出:
A B C D
0 1 2 3 4
1 5 6 7 8
A B C D
0 9 10 11 12
1 13 14 15 16
A B C D
0 17 18 19 20
1 21 22 23 24
A B C D
0 25 26 27 28
1 29 30 31 32
A B C D
0 33 34 35 36
1 37 38 39 40
If you have a list of numbers indicating the slicing positions, you can pass in a dictionary as the group mapping:
如果您有一个数字列表,指示切片位置,您可以传入字典作为组映射:
import numpy as np
slice_at = [3, 5]
group_sizes = np.diff([0] + slice_at + [len(df)])
mapping = dict(zip(df.index, np.repeat(range(len(group_sizes)), group_sizes)))
for _, df_k in df.groupby(mapping):
print df_k.reset_index(drop=True)
Output:
输出:
A B C D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
A B C D
0 13 14 15 16
1 17 18 19 20
A B C D
0 21 22 23 24
1 25 26 27 28
2 29 30 31 32
3 33 34 35 36
4 37 38 39 40