重塑pandas列以允许sum而不是所有值

时间:2022-09-29 07:37:33

I have a data frame with 10 columns which successfully loads into a classifier. Now I am trying to load the sum of the columns instead of all 10 columns.

我有一个包含10列的数据框,可以成功加载到分类器中。现在我试图加载列的总和而不是所有10列。

previous_games_stats = pd.read_csv('stats/2016-2017 CANUCKS STATS.csv', header=1)
numGamesToLookBack = 10;

X = previous_games_stats[['GF', 'GA']]

X = X[0:numGamesToLookBack] #num games to look back
stats_feature_names = list(X.columns.values)

totals = pd.DataFrame(X, columns=stats_feature_names)

y = previous_games_stats['Unnamed: 7'] #outcome variable (win/loss)
y = y[numGamesToLookBack+1]

df = pd.DataFrame(iris.data, columns=iris.feature_names)
stats_df = pd.DataFrame(X, columns=stats_feature_names).sum()

The final line (with .sum() at the end) causes stats_df to go form being formatted like:

最后一行(最后使用.sum())会导致stats_df格式化为:

   GF  GA
0   2   1
1   4   3
2   2   1
3   2   1
4   3   4
5   2   4
6   0   3
7   0   2
8   2   5
9   0   3

to:

GF    17
GA    27

But I want to keep the same format, so the end result should be this:

但我希望保持相同的格式,因此最终结果应为:

    GF    GA
0   17    27

Since it is getting re-formatted, I am getting the following error:

由于它正在重新格式化,我收到以下错误:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 3

What can I do to make the format stay the same?

我该怎么做才能使格式保持不变?

1 个解决方案

#1


0  

If call sum to DataFrame, get Series. For one row DataFrame use:

如果对DataFrame调用sum,请获取Series。对于一行DataFrame使用:

stats_df = pd.DataFrame(X, columns=stats_feature_names).sum().to_frame().T

Another solution:

df1 = pd.DataFrame(X, columns=stats_feature_names)
stats_df = pd.DataFrame([df1.sum().values], columns=df.columns)

#1


0  

If call sum to DataFrame, get Series. For one row DataFrame use:

如果对DataFrame调用sum,请获取Series。对于一行DataFrame使用:

stats_df = pd.DataFrame(X, columns=stats_feature_names).sum().to_frame().T

Another solution:

df1 = pd.DataFrame(X, columns=stats_feature_names)
stats_df = pd.DataFrame([df1.sum().values], columns=df.columns)