In the case of weather or stock market data, temperatures and stock prices are both measured at multiple stations or stock tickers for any given date.
在天气或股票市场数据的情况下,温度和股票价格都是在多个站点或任何给定日期的股票代码测量的。
Therefore what is the most effective way to set an index which contains two fields?
因此,设置包含两个字段的索引的最有效方法是什么?
For weather: the weather_station and then Date
对于天气:weather_station然后Date
For Stock Data: the stock_code and then Date
对于股票数据:stock_code,然后是Date
Setting the index in this way would allow filtering such as:
以这种方式设置索引将允许过滤,例如:
stock_df["code"]["start_date":"end_date"]
- stock_df [ “码”] [ “起始日期”: “END_DATE”]
weather_df["station"]["start_date":"end_date"]
- weather_df [ “站”] [ “起始日期”: “END_DATE”]
2 个解决方案
#1
1
As mentioned by Anton you need to use MultiIndex as follows:
如Anton所述,您需要使用MultiIndex,如下所示:
stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])
weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])
#2
0
That functionality currently exists. Please refer to the documentation for more examples.
该功能目前存在。有关更多示例,请参阅文档。
stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'],
'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'],
'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])
>>> stock_df
price
symbol date
AAPL 2016-1-1 100.0
2016-1-2 101.0
F 2016-1-1 50.0
2016-1-2 47.5
2016-1-3 49.0
>>> stock_df.loc['AAPL']
price
date
2016-1-1 100
2016-1-2 101
>>> stock_df.loc['AAPL', '2016-1-2']
price 101
Name: (AAPL, 2016-1-2), dtype: float64
#1
1
As mentioned by Anton you need to use MultiIndex as follows:
如Anton所述,您需要使用MultiIndex,如下所示:
stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])
weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])
#2
0
That functionality currently exists. Please refer to the documentation for more examples.
该功能目前存在。有关更多示例,请参阅文档。
stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'],
'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'],
'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])
>>> stock_df
price
symbol date
AAPL 2016-1-1 100.0
2016-1-2 101.0
F 2016-1-1 50.0
2016-1-2 47.5
2016-1-3 49.0
>>> stock_df.loc['AAPL']
price
date
2016-1-1 100
2016-1-2 101
>>> stock_df.loc['AAPL', '2016-1-2']
price 101
Name: (AAPL, 2016-1-2), dtype: float64