具有时间序列的统计汇总数据

时间:2022-06-06 23:10:32

I think I'm missing something with my understanding of group by in pandas. I've got my Dataframe indexed on 'Date' and the frame has a column called 'Year' where 2014-10-10 is 2014, etc.

我想我在大熊猫中对群体的理解遗漏了一些东西。我的Dataframe已在'Date'上编入索引,并且该框架有一个名为'Year'的列,2014-10-10是2014年等。

The point is I want to correlate Year 1 data to Year 2 data, and so on. What am I supposed to do with a list of index values?

关键是我想将第1年数据与第2年数据相关联,依此类推。我应该怎么做一个索引值列表?

My input is:

我的意见是:

Date    Adj Close   Year
2013-Dec-31 0.16    2013
2013-Dec-30 0.13    2013
2013-Dec-27 0.11    2013
2012-Dec-31 0.1     2012
2012-Dec-28 0.1     2012
2012-Dec-27 0.1     2012
2012-Dec-26 0.1     2012

and to do the correlation they must be side by side frames?

并且要进行相关,它们必须是并排框架?

Date    Adj Close   Year    Date    Adj Close   Year
2012-Dec-31 0.1     2012    2013-Dec-31 0.16    2013
2012-Dec-28 0.1     2012    2013-Dec-30 0.13    2013
2012-Dec-27 0.1     2012    2013-Dec-27 0.11    2013

Do I have to make a new dataframe for each year group and merge them?

我是否必须为每个年度组创建一个新的数据框并合并它们?

1 个解决方案

#1


0  

There is no merge to be done. All your data is date-specific, just "stack" the data behind each other, and then make sure you group by month,day to get the yearly correlation (for the same day)

没有合并要做。您的所有数据都是特定于日期的,只是将数据“堆叠”在一起,然后确保按月分组以获得年度相关性(同一天)

Just make sure that your index is a proper time series, and then pandas will do all the magic for you. That is, have a look at thisand this in the manual. Any function that uses the index to weight against / compute distances, will automatically do this correctly for you if your index is a time series.

只要确保你的索引是一个合适的时间序列,然后大熊猫将为你做所有的魔术。也就是说,请在手册中查看此内容。如果索引是时间序列,任何使用索引来加权/计算距离的函数都会自动为您正确执行此操作。

Here, I will plot the autocorrelation for you, again from the documentation

在这里,我将再次从文档中为您绘制自相关图

df = pd.read_clipboard()
df.index = pd.DatetimeIndex(df.Date)
from pandas.tools.plotting import autocorrelation_plot
autocorrelation_plot(df['Adj')

and the output is

而输出是

具有时间序列的统计汇总数据

If you want to apply additional filtering, with this index, you can even select a specific year using df['2013'].

如果您想使用此索引应用其他过滤,您甚至可以使用df ['2013']选择特定年份。

#1


0  

There is no merge to be done. All your data is date-specific, just "stack" the data behind each other, and then make sure you group by month,day to get the yearly correlation (for the same day)

没有合并要做。您的所有数据都是特定于日期的,只是将数据“堆叠”在一起,然后确保按月分组以获得年度相关性(同一天)

Just make sure that your index is a proper time series, and then pandas will do all the magic for you. That is, have a look at thisand this in the manual. Any function that uses the index to weight against / compute distances, will automatically do this correctly for you if your index is a time series.

只要确保你的索引是一个合适的时间序列,然后大熊猫将为你做所有的魔术。也就是说,请在手册中查看此内容。如果索引是时间序列,任何使用索引来加权/计算距离的函数都会自动为您正确执行此操作。

Here, I will plot the autocorrelation for you, again from the documentation

在这里,我将再次从文档中为您绘制自相关图

df = pd.read_clipboard()
df.index = pd.DatetimeIndex(df.Date)
from pandas.tools.plotting import autocorrelation_plot
autocorrelation_plot(df['Adj')

and the output is

而输出是

具有时间序列的统计汇总数据

If you want to apply additional filtering, with this index, you can even select a specific year using df['2013'].

如果您想使用此索引应用其他过滤,您甚至可以使用df ['2013']选择特定年份。