在pandas数据框中设置两列作为时间序列分析的索引

时间:2021-08-26 15:46:20

In the case of weather or stock market data, temperatures and stock prices are both measured at multiple stations or stock tickers for any given date.

在天气或股票市场数据的情况下,温度和股票价格都是在多个站点或任何给定日期的股票代码测量的。

Therefore what is the most effective way to set an index which contains two fields?

因此,设置包含两个字段的索引的最有效方法是什么?

For weather: the weather_station and then Date

对于天气:weather_station然后Date

For Stock Data: the stock_code and then Date

对于股票数据:stock_code,然后是Date

Setting the index in this way would allow filtering such as:

以这种方式设置索引将允许过滤,例如:

  • stock_df["code"]["start_date":"end_date"]
  • stock_df [ “码”] [ “起始日期”: “END_DATE”]
  • weather_df["station"]["start_date":"end_date"]
  • weather_df [ “站”] [ “起始日期”: “END_DATE”]

2 个解决方案

#1


1  

As mentioned by Anton you need to use MultiIndex as follows:

如Anton所述,您需要使用MultiIndex,如下所示:

stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])

weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])

#2


0  

That functionality currently exists. Please refer to the documentation for more examples.

该功能目前存在。有关更多示例,请参阅文档。

stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'], 
                         'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'], 
                         'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])

>>> stock_df
                 price
symbol date           
AAPL   2016-1-1  100.0
       2016-1-2  101.0
F      2016-1-1   50.0
       2016-1-2   47.5
       2016-1-3   49.0

>>> stock_df.loc['AAPL']
          price
date           
2016-1-1    100
2016-1-2    101

>>> stock_df.loc['AAPL', '2016-1-2']
price    101
Name: (AAPL, 2016-1-2), dtype: float64

#1


1  

As mentioned by Anton you need to use MultiIndex as follows:

如Anton所述,您需要使用MultiIndex,如下所示:

stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])

weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])

#2


0  

That functionality currently exists. Please refer to the documentation for more examples.

该功能目前存在。有关更多示例,请参阅文档。

stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'], 
                         'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'], 
                         'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])

>>> stock_df
                 price
symbol date           
AAPL   2016-1-1  100.0
       2016-1-2  101.0
F      2016-1-1   50.0
       2016-1-2   47.5
       2016-1-3   49.0

>>> stock_df.loc['AAPL']
          price
date           
2016-1-1    100
2016-1-2    101

>>> stock_df.loc['AAPL', '2016-1-2']
price    101
Name: (AAPL, 2016-1-2), dtype: float64