I've been very confused about how python axes are defined, and whether they refer to a DataFrame's rows or columns. Consider the code below:
我对python轴是如何定义的以及它们是引用DataFrame的行还是列感到非常困惑。考虑下面的代码:
>>> df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"])
>>> df
col1 col2 col3 col4
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
So if we call df.mean(axis=1)
, we'll get a mean across the rows:
如果我们叫它df。mean(坐标轴=1)我们会得到横轴上的均值
>>> df.mean(axis=1)
0 1
1 2
2 3
However, if we call df.drop(name, axis=1)
, we actually drop a column, not a row:
但是,如果我们调用df。删除(name, axis=1),我们实际上删除了一列,而不是一行:
>>> df.drop("col4", axis=1)
col1 col2 col3
0 1 1 1
1 2 2 2
2 3 3 3
Can someone help me understand what is meant by an "axis" in pandas/numpy/scipy?
有谁能帮我理解熊猫里的“轴”是什么意思?
A side note, DataFrame.mean
just might be defined wrong. It says in the documentation for DataFrame.mean
that axis=1
is supposed to mean a mean over the columns, not the rows...
边注,DataFrame。平均值可能被定义为错误的。它在DataFrame的文档中说。表示坐标轴=1表示的是列的平均值,而不是行。
4 个解决方案
#1
133
It's perhaps simplest to remember it as 0=down and 1=across.
它可能是最简单的,记为0=down和1=across。
This means:
这意味着:
- Use
axis=0
to apply a method down each column, or to the row labels (the index). - 使用axis=0对每一列或行标签(索引)应用方法。
- Use
axis=1
to apply a method across each row, or to the column labels. - 使用axis=1跨每一行或对列标签应用方法。
Here's a picture to show the parts of a DataFrame that each axis refers to:
这是一幅图,显示了每个轴都指向的数据aframe部分:
It's also useful to remember that Pandas follows NumPy's use of the word axis
. The usage is explained in NumPy's glossary of terms:
记住熊猫遵循NumPy对轴的用法也是有用的。NumPy的术语表解释了这个用法:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
为具有多个维度的数组定义坐标轴。二维数组有两个对应的轴:第一个轴垂直向下(轴0),第二个轴水平地跨列(轴1)。
So, concerning the method in the question, df.mean(axis=1)
, seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0)
would be an operation acting vertically downwards across rows.
因此,关于问题中的方法,df.mean(axis=1)似乎得到了正确的定义。它将条目的平均水平跨列,即沿着每一行。另一方面,df.mean(axis=0)将是一个跨行垂直向下的操作。
Similarly, df.drop(name, axis=1)
refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0
would make the method act on rows instead.
同样,df。drop(name, axis=1)指的是列标签上的操作,因为它们直观地跨越横轴。指定axis=0将使方法对行执行。
#2
6
Another way to explain:
另一种解释:
// Not realistic but ideal for understanding the axis parameter
df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
columns=["idx1", "idx2", "idx3", "idx4"],
index=["idx1", "idx2", "idx3"]
)
---------------------------------------1
| idx1 idx2 idx3 idx4
| idx1 1 1 1 1
| idx2 2 2 2 2
| idx3 3 3 3 3
0
About df.drop
(axis means the position)
df。下降(轴表示位置)
A: I wanna remove idx3.
B: **Which one**? // typing while waiting response: df.drop("idx3",
A: The one which is on axis 1
B: OK then it is >> df.drop("idx3", axis=1)
// Result
---------------------------------------1
| idx1 idx2 idx4
| idx1 1 1 1
| idx2 2 2 2
| idx3 3 3 3
0
About df.apply
(axis means direction)
df。应用方向(轴)
A: I wanna apply sum.
B: Which direction? // typing while waiting response: df.apply(lambda x: x.sum(),
A: The one which is on *parallel to axis 0*
B: OK then it is >> df.apply(lambda x: x.sum(), axis=0)
// Result
idx1 6
idx2 6
idx3 6
idx4 6
#3
2
There are already right answers, but I give you another example with > 2 dimensions.
已经有正确的答案了,但是我给你另一个>二维的例子。
The parameter axis
means axis to be changed.
For example, consider that there is a dataframe with dimension a x b x c.
参数轴表示要更改的轴。例如,考虑有一个维度为axbxc的dataframe。
-
df.mean(axis=1)
returns a dataframe with dimenstion a x 1 x c. - 平均值(axis=1)返回一个具有x 1 x c维度的dataframe。
-
df.drop("col4", axis=1)
returns a dataframe with dimension a x (b-1) x c. - df。drop(“col4”,axis=1)返回一个带有x (b-1) x c维度的dataframe。
#4
0
It should be more widely known that the string aliases 'index' and 'columns' can be used in place of the integers 0/1. The aliases are much more explicit and help me remember how the calculations take place. Another alias for 'index' is 'rows'.
应该更广泛地知道,字符串别名“索引”和“列”可以用来代替整数0/1。别名更加明确,帮助我记住计算是如何进行的。“索引”的另一个别名是“行”。
When axis='index'
is used, then the calculations happen down the columns, which is confusing. But, I remember it as getting a result that is the same size as another row.
当使用axis='index'时,计算将沿着列进行,这很令人困惑。但是,我记得得到的结果和另一行一样大。
Let's get some data on the screen to see what I am talking about:
让我们在屏幕上找到一些数据,看看我在说什么:
df = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
a b c d
0 0.990730 0.567822 0.318174 0.122410
1 0.144962 0.718574 0.580569 0.582278
2 0.477151 0.907692 0.186276 0.342724
3 0.561043 0.122771 0.206819 0.904330
4 0.427413 0.186807 0.870504 0.878632
5 0.795392 0.658958 0.666026 0.262191
6 0.831404 0.011082 0.299811 0.906880
7 0.749729 0.564900 0.181627 0.211961
8 0.528308 0.394107 0.734904 0.961356
9 0.120508 0.656848 0.055749 0.290897
When we want to take the mean of all the columns, we use axis='index'
to get the following:
当我们要取所有列的均值时,我们用axis='index'得到如下结果:
df.mean(axis='index')
a 0.562664
b 0.478956
c 0.410046
d 0.546366
dtype: float64
The same result would be gotten by:
同样的结果可以得到:
df.mean() # default is axis=0
df.mean(axis=0)
df.mean(axis='rows')
To get use an operation left to right on the rows, use axis='columns'. I remember it by thinking that an additional column may be added to my DataFrame:
要在行上使用从左到右的操作,请使用axis='columns'。我记得我想在我的DataFrame中增加一个专栏:
df.mean(axis='columns')
0 0.499784
1 0.506596
2 0.478461
3 0.448741
4 0.590839
5 0.595642
6 0.512294
7 0.427054
8 0.654669
9 0.281000
dtype: float64
The same result would be gotten by:
同样的结果可以得到:
df.mean(axis=1)
Add a new row with axis=0/index/rows
Let's use these results to add additional rows or columns to complete the explanation. So, whenever using axis = 0/index/rows, its like getting a new row of the DataFrame. Let's add a row:
让我们使用这些结果添加额外的行或列来完成解释。因此,每当使用axis = 0/index/row时,就像获得DataFrame的新行一样。让我们添加一行:
df.append(df.mean(axis='rows'), ignore_index=True)
a b c d
0 0.990730 0.567822 0.318174 0.122410
1 0.144962 0.718574 0.580569 0.582278
2 0.477151 0.907692 0.186276 0.342724
3 0.561043 0.122771 0.206819 0.904330
4 0.427413 0.186807 0.870504 0.878632
5 0.795392 0.658958 0.666026 0.262191
6 0.831404 0.011082 0.299811 0.906880
7 0.749729 0.564900 0.181627 0.211961
8 0.528308 0.394107 0.734904 0.961356
9 0.120508 0.656848 0.055749 0.290897
10 0.562664 0.478956 0.410046 0.546366
Add a new column with axis=1/columns
Similarly, when axis=1/columns it will create data that can be easily made into its own column:
类似地,当axis=1/列时,它将创建可以很容易地创建到它自己的列中的数据:
df.assign(e=df.mean(axis='columns'))
a b c d e
0 0.990730 0.567822 0.318174 0.122410 0.499784
1 0.144962 0.718574 0.580569 0.582278 0.506596
2 0.477151 0.907692 0.186276 0.342724 0.478461
3 0.561043 0.122771 0.206819 0.904330 0.448741
4 0.427413 0.186807 0.870504 0.878632 0.590839
5 0.795392 0.658958 0.666026 0.262191 0.595642
6 0.831404 0.011082 0.299811 0.906880 0.512294
7 0.749729 0.564900 0.181627 0.211961 0.427054
8 0.528308 0.394107 0.734904 0.961356 0.654669
9 0.120508 0.656848 0.055749 0.290897 0.281000
It appears that you can see all the aliases with the following private variables:
似乎您可以看到所有带有以下私有变量的别名:
df._AXIS_ALIASES
{'rows': 0}
df._AXIS_NUMBERS
{'columns': 1, 'index': 0}
df._AXIS_NAMES
{0: 'index', 1: 'columns'}
#1
133
It's perhaps simplest to remember it as 0=down and 1=across.
它可能是最简单的,记为0=down和1=across。
This means:
这意味着:
- Use
axis=0
to apply a method down each column, or to the row labels (the index). - 使用axis=0对每一列或行标签(索引)应用方法。
- Use
axis=1
to apply a method across each row, or to the column labels. - 使用axis=1跨每一行或对列标签应用方法。
Here's a picture to show the parts of a DataFrame that each axis refers to:
这是一幅图,显示了每个轴都指向的数据aframe部分:
It's also useful to remember that Pandas follows NumPy's use of the word axis
. The usage is explained in NumPy's glossary of terms:
记住熊猫遵循NumPy对轴的用法也是有用的。NumPy的术语表解释了这个用法:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
为具有多个维度的数组定义坐标轴。二维数组有两个对应的轴:第一个轴垂直向下(轴0),第二个轴水平地跨列(轴1)。
So, concerning the method in the question, df.mean(axis=1)
, seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0)
would be an operation acting vertically downwards across rows.
因此,关于问题中的方法,df.mean(axis=1)似乎得到了正确的定义。它将条目的平均水平跨列,即沿着每一行。另一方面,df.mean(axis=0)将是一个跨行垂直向下的操作。
Similarly, df.drop(name, axis=1)
refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0
would make the method act on rows instead.
同样,df。drop(name, axis=1)指的是列标签上的操作,因为它们直观地跨越横轴。指定axis=0将使方法对行执行。
#2
6
Another way to explain:
另一种解释:
// Not realistic but ideal for understanding the axis parameter
df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
columns=["idx1", "idx2", "idx3", "idx4"],
index=["idx1", "idx2", "idx3"]
)
---------------------------------------1
| idx1 idx2 idx3 idx4
| idx1 1 1 1 1
| idx2 2 2 2 2
| idx3 3 3 3 3
0
About df.drop
(axis means the position)
df。下降(轴表示位置)
A: I wanna remove idx3.
B: **Which one**? // typing while waiting response: df.drop("idx3",
A: The one which is on axis 1
B: OK then it is >> df.drop("idx3", axis=1)
// Result
---------------------------------------1
| idx1 idx2 idx4
| idx1 1 1 1
| idx2 2 2 2
| idx3 3 3 3
0
About df.apply
(axis means direction)
df。应用方向(轴)
A: I wanna apply sum.
B: Which direction? // typing while waiting response: df.apply(lambda x: x.sum(),
A: The one which is on *parallel to axis 0*
B: OK then it is >> df.apply(lambda x: x.sum(), axis=0)
// Result
idx1 6
idx2 6
idx3 6
idx4 6
#3
2
There are already right answers, but I give you another example with > 2 dimensions.
已经有正确的答案了,但是我给你另一个>二维的例子。
The parameter axis
means axis to be changed.
For example, consider that there is a dataframe with dimension a x b x c.
参数轴表示要更改的轴。例如,考虑有一个维度为axbxc的dataframe。
-
df.mean(axis=1)
returns a dataframe with dimenstion a x 1 x c. - 平均值(axis=1)返回一个具有x 1 x c维度的dataframe。
-
df.drop("col4", axis=1)
returns a dataframe with dimension a x (b-1) x c. - df。drop(“col4”,axis=1)返回一个带有x (b-1) x c维度的dataframe。
#4
0
It should be more widely known that the string aliases 'index' and 'columns' can be used in place of the integers 0/1. The aliases are much more explicit and help me remember how the calculations take place. Another alias for 'index' is 'rows'.
应该更广泛地知道,字符串别名“索引”和“列”可以用来代替整数0/1。别名更加明确,帮助我记住计算是如何进行的。“索引”的另一个别名是“行”。
When axis='index'
is used, then the calculations happen down the columns, which is confusing. But, I remember it as getting a result that is the same size as another row.
当使用axis='index'时,计算将沿着列进行,这很令人困惑。但是,我记得得到的结果和另一行一样大。
Let's get some data on the screen to see what I am talking about:
让我们在屏幕上找到一些数据,看看我在说什么:
df = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
a b c d
0 0.990730 0.567822 0.318174 0.122410
1 0.144962 0.718574 0.580569 0.582278
2 0.477151 0.907692 0.186276 0.342724
3 0.561043 0.122771 0.206819 0.904330
4 0.427413 0.186807 0.870504 0.878632
5 0.795392 0.658958 0.666026 0.262191
6 0.831404 0.011082 0.299811 0.906880
7 0.749729 0.564900 0.181627 0.211961
8 0.528308 0.394107 0.734904 0.961356
9 0.120508 0.656848 0.055749 0.290897
When we want to take the mean of all the columns, we use axis='index'
to get the following:
当我们要取所有列的均值时,我们用axis='index'得到如下结果:
df.mean(axis='index')
a 0.562664
b 0.478956
c 0.410046
d 0.546366
dtype: float64
The same result would be gotten by:
同样的结果可以得到:
df.mean() # default is axis=0
df.mean(axis=0)
df.mean(axis='rows')
To get use an operation left to right on the rows, use axis='columns'. I remember it by thinking that an additional column may be added to my DataFrame:
要在行上使用从左到右的操作,请使用axis='columns'。我记得我想在我的DataFrame中增加一个专栏:
df.mean(axis='columns')
0 0.499784
1 0.506596
2 0.478461
3 0.448741
4 0.590839
5 0.595642
6 0.512294
7 0.427054
8 0.654669
9 0.281000
dtype: float64
The same result would be gotten by:
同样的结果可以得到:
df.mean(axis=1)
Add a new row with axis=0/index/rows
Let's use these results to add additional rows or columns to complete the explanation. So, whenever using axis = 0/index/rows, its like getting a new row of the DataFrame. Let's add a row:
让我们使用这些结果添加额外的行或列来完成解释。因此,每当使用axis = 0/index/row时,就像获得DataFrame的新行一样。让我们添加一行:
df.append(df.mean(axis='rows'), ignore_index=True)
a b c d
0 0.990730 0.567822 0.318174 0.122410
1 0.144962 0.718574 0.580569 0.582278
2 0.477151 0.907692 0.186276 0.342724
3 0.561043 0.122771 0.206819 0.904330
4 0.427413 0.186807 0.870504 0.878632
5 0.795392 0.658958 0.666026 0.262191
6 0.831404 0.011082 0.299811 0.906880
7 0.749729 0.564900 0.181627 0.211961
8 0.528308 0.394107 0.734904 0.961356
9 0.120508 0.656848 0.055749 0.290897
10 0.562664 0.478956 0.410046 0.546366
Add a new column with axis=1/columns
Similarly, when axis=1/columns it will create data that can be easily made into its own column:
类似地,当axis=1/列时,它将创建可以很容易地创建到它自己的列中的数据:
df.assign(e=df.mean(axis='columns'))
a b c d e
0 0.990730 0.567822 0.318174 0.122410 0.499784
1 0.144962 0.718574 0.580569 0.582278 0.506596
2 0.477151 0.907692 0.186276 0.342724 0.478461
3 0.561043 0.122771 0.206819 0.904330 0.448741
4 0.427413 0.186807 0.870504 0.878632 0.590839
5 0.795392 0.658958 0.666026 0.262191 0.595642
6 0.831404 0.011082 0.299811 0.906880 0.512294
7 0.749729 0.564900 0.181627 0.211961 0.427054
8 0.528308 0.394107 0.734904 0.961356 0.654669
9 0.120508 0.656848 0.055749 0.290897 0.281000
It appears that you can see all the aliases with the following private variables:
似乎您可以看到所有带有以下私有变量的别名:
df._AXIS_ALIASES
{'rows': 0}
df._AXIS_NUMBERS
{'columns': 1, 'index': 0}
df._AXIS_NAMES
{0: 'index', 1: 'columns'}