如何循环分组的熊猫数据?

DataFrame:

  c_os_family_ss c_os_major_is l_customer_id_i
0      Windows 7                         90418
1      Windows 7                         90418
2      Windows 7                         90418

Code:

代码:

print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
    print name
    print group

I'm trying to just loop over the aggregated data, but I get the error:

我只是对聚合的数据进行循环，但是我得到了错误:

ValueError: too many values to unpack

ValueError:太多的值无法解包

@EdChum, here's the expected output:

@EdChum，这是预期输出:

                                                    c_os_family_ss  \
l_customer_id_i
131572           Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...

                                                     c_os_major_is
l_customer_id_i
131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...

The output is not the problem, I wish to loop over every group.

输出不是问题，我希望对每个组进行循环。

3 个解决方案

#1

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.

df.groupby(“l_customer_id_i”)。agg(lambda x: '，'.join(x))已经返回一个dataframe，因此您不能再循环组。

In general:

一般来说:

df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:

GroupBy(…)返回GroupBy对象(DataFrameGroupBy或SeriesGroupBy)，使用此方法，您可以遍历组(如本文文档中所述)。你可以这样做:
```
grouped = df.groupby('A')

for name, group in grouped:
    ...
```
When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).

在您的示例df.groupby(…).agg(…)中，当您对groupby应用一个函数时(但这也可以是转换、应用、平均、…)，您将将将该函数应用于不同组的结果合并到一个dataframe中(groupby的“分割-应用-应用-结合”范式的应用和组合步骤)。因此，这样做的结果仍然是一个DataFrame(或一系列，取决于应用的函数)。

#2

You can iterate over the index values if your dataframe has already been created.

如果已经创建了dataframe，则可以迭代索引值。

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

#3

Below is an example of how to iterate over groups of columns of a table to generate "create" statements from a frame describing a database:

下面是一个示例，演示如何遍历表的一组列，从描述数据库的框架中生成“create”语句:

#1