使用loc和使用方括号过滤熊猫/Python中的列有什么区别?

时间:2021-01-18 21:41:15

I've noticed three methods of selecting a column in a Pandas DataFrame:

我注意到有三种方法可以在大熊猫的数据中选择一列:

First method of selecting a column using loc:

使用loc选择列的第一种方法:

df_new = df.loc[:, 'col1']

Second method - seems simpler and faster:

第二种方法似乎更简单、更快:

df_new = df['col1']

Third method - most convenient:

第三种方法——最方便:

df_new = df.col1

Is there a difference between these three methods? I don't think so, in which case I'd rather use the third method.

这三种方法有什么不同吗?我不这么认为,在这种情况下我宁愿用第三种方法。

I'm mostly curious as to why there appear to be three methods for doing the same thing.

我很好奇为什么会有三种方法来做同样的事情。

1 个解决方案

#1


5  

If you are selecting a single column, a list of columns, or a slice or rows then there is no difference. However, [] does not allow you to select a single row, a list of rows or a slice of columns. More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

如果您正在选择单个列、列列表、切片或行,那么没有区别。但是,[]不允许您选择单个行、行列表或列切片。更重要的是,如果您的选择同时涉及行和列,那么赋值将成为问题。

df[1:3]['A'] = 5

This selects rows 1 and 2, and then selects column 'A' of the returning object and assign value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of this assignment is

它选择第1行和第2行,然后选择返回对象的列“A”并为其赋值5。问题是,返回的对象可能是一个副本,因此这可能不会更改实际的DataFrame。这引发了SettingWithCopyWarning。这个作业的正确方法是

df.loc[1:3, 'A'] = 5

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

使用.loc,您可以保证修改原始的DataFrame。它还允许您切片列(df)。loc[:, 'C':'F']),选择一行(df.loc[5]),并选择一个行列表(df。疯狂的[[1、2、5]])。

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.

还需要注意的是,这两个参数并没有同时包含在API中。有关更多细节,请参见unutbu的回答。


Note: Getting columns with [] vs . is a completely different topic. . is only there for convenince. It only allows accessing columns whose name are valid Python identifier (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

注意:获取带有[]vs的列。是一个完全不同的话题。只有在那里才有召集人。它只允许访问名称为有效Python标识符的列(即它们不能包含空格,它们不能由数字组成…)。当名称与系列/DataFrame方法冲突时,不能使用它。它也不能用于不存在的列(例如赋值df)。a = 1在没有a列的情况下是不成立的。和[]是一样的。

#1


5  

If you are selecting a single column, a list of columns, or a slice or rows then there is no difference. However, [] does not allow you to select a single row, a list of rows or a slice of columns. More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

如果您正在选择单个列、列列表、切片或行,那么没有区别。但是,[]不允许您选择单个行、行列表或列切片。更重要的是,如果您的选择同时涉及行和列,那么赋值将成为问题。

df[1:3]['A'] = 5

This selects rows 1 and 2, and then selects column 'A' of the returning object and assign value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of this assignment is

它选择第1行和第2行,然后选择返回对象的列“A”并为其赋值5。问题是,返回的对象可能是一个副本,因此这可能不会更改实际的DataFrame。这引发了SettingWithCopyWarning。这个作业的正确方法是

df.loc[1:3, 'A'] = 5

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

使用.loc,您可以保证修改原始的DataFrame。它还允许您切片列(df)。loc[:, 'C':'F']),选择一行(df.loc[5]),并选择一个行列表(df。疯狂的[[1、2、5]])。

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.

还需要注意的是,这两个参数并没有同时包含在API中。有关更多细节,请参见unutbu的回答。


Note: Getting columns with [] vs . is a completely different topic. . is only there for convenince. It only allows accessing columns whose name are valid Python identifier (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

注意:获取带有[]vs的列。是一个完全不同的话题。只有在那里才有召集人。它只允许访问名称为有效Python标识符的列(即它们不能包含空格,它们不能由数字组成…)。当名称与系列/DataFrame方法冲突时,不能使用它。它也不能用于不存在的列(例如赋值df)。a = 1在没有a列的情况下是不成立的。和[]是一样的。