熊猫DataFrame从DateTimeIndex - KeyError中选择行列表。理解为什么

I'm trying to understand why I get this error. I already have a solution for this issue and it was actually solved here, just need to understand why it doesn't work as I was expecting.

我试着理解为什么会有这个错误。我已经有了一个解决这个问题的方法它实际上已经解决了，只是需要理解为什么它不能像我预期的那样工作。

I would like to understand why this throws a KeyError:

我想知道为什么会有这样的错误:

dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.identity(4), index=dates, columns=list('ABCD'))
df.loc[['20130102', '20130103'],:]

with the following feedback:

用以下反馈:

KeyError: "None of [['20130102', '20130103']] are in the [index]"

As explained here, the solution is just to do:

正如这里所解释的，解决方案就是:

df.loc[pd.to_datetime(['20130102','20130104']),:]

So the problem is definitely with the way loc takes the string list as argument for selecting from a DateTimeIndex. However, I can see that the following calls are ok for this function:

因此，问题肯定是loc将字符串列表作为参数从DateTimeIndex中进行选择的方式。但是，我可以看到下面的调用对于这个函数是可以的:

df.loc['20130102':'20130104',:]

and

和

df.loc['20130102']

I would like to understand how this works and would appreciate any resources I can use to predict the behavior of this function depending of how it is being called. I read Indexing and Selecting Data and Time Series/Date functionality from pandas documentation but couldn't find an explanation for this.

我想了解这是如何运作的，并将感激我所能使用的任何资源来预测这个函数的行为，取决于它是如何被调用的。我从熊猫文档中读取索引和选择数据和时间序列/日期功能，但是没有找到解释。

1 个解决方案

#1

Typically, when you pass an array like object to loc, Pandas is going to try to locate each element of that array in the index. If it doesn't find it, you'll get a KeyError. And! you passed an array of strings when the values in the index are Timestamps... so those strings definitely aren't in the index.

通常，当向loc传递一个数组时，熊猫会尝试在索引中定位该数组的每个元素。如果没有找到，就会有一个关键错误。和!当索引中的值是时间戳时，您传递了一个字符串数组。所以这些字符串肯定不在索引中。

However, Pandas also tries to make things easier for you. In particular, with a DatetimeIndex, If you were to pass a string scalar

然而，熊猫也试图让你更容易。特别是，使用DatetimeIndex，如果要传递一个字符串标量。

df.loc['20130102']

A    0.0
B    1.0
C    0.0
D    0.0
Name: 2013-01-02 00:00:00, dtype: float64

Pandas will attempt to parse that scalar as a Timestamp and see if that value is in the index.

熊猫将尝试将这个标量解析为时间戳，并查看该值是否在索引中。

If you were to pass a slice object

如果要传递一个切片对象。

df.loc['20130102':'20130104']

              A    B    C    D
2013-01-02  0.0  1.0  0.0  0.0
2013-01-03  0.0  0.0  1.0  0.0
2013-01-04  0.0  0.0  0.0  1.0

Pandas will also attempt to parse the bits of the slice object as Timestamp and return an appropriately sliced dataframe.

熊猫还将尝试将切片对象的比特解析为时间戳，并返回适当的切片数据。

Your KeyError is simply passed the limits of how much helpfulness the Pandas Devs had time to code.

你的关键错误仅仅是通过了对熊猫们有时间编码的帮助。

#1