I have a DataFrame with the populations of each city. I want to calculate the average population in each state using the populations from each city within that state.
我有一个DataFrame与每个城市的人口。我想用这个州内每个城市的人口计算每个州的平均人口。
Here's a sample of the data:
这是一个数据样本:
State City Population State Ave
CA San Diego 10000 ??
CA Palo Alto 8000 ??
CA Marin 5000 ??
SC Columbia 4000 ??
SC Charleston 3000 ??
SC Greenville 4000 ??
I can retrieve the averages with:
我可以通过以下方式检索平均值:
data = pd.read_csv(/Downloads/test.csv')
grouped = data.group_by("State")
for k, group in grouped:
print grouped.mean()
State Population
CA 7666.66666667
SC 3666.66666667
But how do I assign the state average to each city?
但是,我如何为每个城市分配州平均值?
Note: I tried to simplify a big problem with this smaller example and the data above, which is obviously not real.
注意:我试图用这个较小的例子和上面的数据简化一个大问题,这显然不是真的。
2 个解决方案
#1
You could use transform
and place the result in df['Avg']
你可以使用transform并将结果放在df ['Avg']中
In [216]: df['Avg'] = df.groupby('State')['Population'].transform('mean')
In [217]: df
Out[217]:
State City Population Avg
0 CA SanDiego 10000 7666.666667
1 CA PaloAlto 8000 7666.666667
2 CA Marin 5000 7666.666667
3 SC Columbia 4000 3666.666667
4 SC Charleston 3000 3666.666667
5 SC Greenville 4000 3666.666667
#2
mean = df.groupby('State')['Population'].mean()
mean = df.groupby('State')['Population']。mean()
df['mean'] = df.name.apply(mean.get_value)
df ['mean'] = df.name.apply(mean.get_value)
#1
You could use transform
and place the result in df['Avg']
你可以使用transform并将结果放在df ['Avg']中
In [216]: df['Avg'] = df.groupby('State')['Population'].transform('mean')
In [217]: df
Out[217]:
State City Population Avg
0 CA SanDiego 10000 7666.666667
1 CA PaloAlto 8000 7666.666667
2 CA Marin 5000 7666.666667
3 SC Columbia 4000 3666.666667
4 SC Charleston 3000 3666.666667
5 SC Greenville 4000 3666.666667
#2
mean = df.groupby('State')['Population'].mean()
mean = df.groupby('State')['Population']。mean()
df['mean'] = df.name.apply(mean.get_value)
df ['mean'] = df.name.apply(mean.get_value)