I think I'm missing something with my understanding of group by in pandas. I've got my Dataframe indexed on 'Date' and the frame has a column called 'Year' where 2014-10-10 is 2014, etc.
我想我在大熊猫中对群体的理解遗漏了一些东西。我的Dataframe已在'Date'上编入索引,并且该框架有一个名为'Year'的列,2014-10-10是2014年等。
The point is I want to correlate Year 1 data to Year 2 data, and so on. What am I supposed to do with a list of index values?
关键是我想将第1年数据与第2年数据相关联,依此类推。我应该怎么做一个索引值列表?
My input is:
我的意见是:
Date Adj Close Year
2013-Dec-31 0.16 2013
2013-Dec-30 0.13 2013
2013-Dec-27 0.11 2013
2012-Dec-31 0.1 2012
2012-Dec-28 0.1 2012
2012-Dec-27 0.1 2012
2012-Dec-26 0.1 2012
and to do the correlation they must be side by side frames?
并且要进行相关,它们必须是并排框架?
Date Adj Close Year Date Adj Close Year
2012-Dec-31 0.1 2012 2013-Dec-31 0.16 2013
2012-Dec-28 0.1 2012 2013-Dec-30 0.13 2013
2012-Dec-27 0.1 2012 2013-Dec-27 0.11 2013
Do I have to make a new dataframe for each year group and merge them?
我是否必须为每个年度组创建一个新的数据框并合并它们?
1 个解决方案
#1
0
There is no merge to be done. All your data is date-specific, just "stack" the data behind each other, and then make sure you group by month,day to get the yearly correlation (for the same day)
没有合并要做。您的所有数据都是特定于日期的,只是将数据“堆叠”在一起,然后确保按月分组以获得年度相关性(同一天)
Just make sure that your index is a proper time series, and then pandas will do all the magic for you. That is, have a look at thisand this in the manual. Any function that uses the index to weight against / compute distances, will automatically do this correctly for you if your index is a time series.
只要确保你的索引是一个合适的时间序列,然后大熊猫将为你做所有的魔术。也就是说,请在手册中查看此内容。如果索引是时间序列,任何使用索引来加权/计算距离的函数都会自动为您正确执行此操作。
Here, I will plot the autocorrelation for you, again from the documentation
在这里,我将再次从文档中为您绘制自相关图
df = pd.read_clipboard()
df.index = pd.DatetimeIndex(df.Date)
from pandas.tools.plotting import autocorrelation_plot
autocorrelation_plot(df['Adj')
and the output is
而输出是
If you want to apply additional filtering, with this index, you can even select a specific year using df['2013']
.
如果您想使用此索引应用其他过滤,您甚至可以使用df ['2013']选择特定年份。
#1
0
There is no merge to be done. All your data is date-specific, just "stack" the data behind each other, and then make sure you group by month,day to get the yearly correlation (for the same day)
没有合并要做。您的所有数据都是特定于日期的,只是将数据“堆叠”在一起,然后确保按月分组以获得年度相关性(同一天)
Just make sure that your index is a proper time series, and then pandas will do all the magic for you. That is, have a look at thisand this in the manual. Any function that uses the index to weight against / compute distances, will automatically do this correctly for you if your index is a time series.
只要确保你的索引是一个合适的时间序列,然后大熊猫将为你做所有的魔术。也就是说,请在手册中查看此内容。如果索引是时间序列,任何使用索引来加权/计算距离的函数都会自动为您正确执行此操作。
Here, I will plot the autocorrelation for you, again from the documentation
在这里,我将再次从文档中为您绘制自相关图
df = pd.read_clipboard()
df.index = pd.DatetimeIndex(df.Date)
from pandas.tools.plotting import autocorrelation_plot
autocorrelation_plot(df['Adj')
and the output is
而输出是
If you want to apply additional filtering, with this index, you can even select a specific year using df['2013']
.
如果您想使用此索引应用其他过滤,您甚至可以使用df ['2013']选择特定年份。