R:分割数据帧,然后执行:每个分割上的每个函数(排序)

时间:2021-08-10 21:16:19

I have a data frame that contains options information for each date. Each date has multiple rows corresponding to a changing range of strike prices:

我有一个数据框架,其中包含每个日期的选项信息。每个日期都有多个行,对应于变动的执行价格范围:

head(df)
       Date C/P   K      Vol     Delta       ID
1 01/23/1997   0 805 0.155814  0.234181 10007288
2 01/23/1997   1 790 0.159603 -0.609276 10333499
3 01/23/1997   0 815 0.141776  0.132414 10106825 
4 01/23/1997   1 700 0.257233 -0.060976 10012499
5 01/23/1997   1 680 0.279465 -0.035616 10072595
6 01/23/1997   0 730 0.197782  0.888286 10307920

I have 216 dates, and each date has 100-300 rows, one for each strike price. I want to split the data frame by date, and for each date frame, use C/P as primary sort key and K as secondary sort key.

我有216个日期,每个日期有100-300行,每个执行价格一个。我想按日期分割数据帧,对于每个日期帧,使用C/P作为主排序键,K作为辅助排序键。

Is plyr the package to use? I've tried split(df, df$Date) but I cannot find any documentation on applying a sorting function to each split data frame.

plyr是要用的包装吗?我尝试过拆分(df, df$Date),但我找不到任何关于将排序函数应用于每个拆分数据框架的文档。

By primary and secondary sort, I mean:

我说的初级和中级排序是指:

Input:
C/P K   Vol Delta
0   800 0.1 0.11
1   800 0.2 0.22
1   700 0.3 0.33
0   700 0.4 0.44
1   900 0.5 0.55
1   600 0.6 0.66
0   600 0.7 0.77
0   900 0.8 0.88

Output:
C/P K   Vol Delta
0   600 0.7 0.77
0   700 0.4 0.44
0   800 0.1 0.11
0   900 0.8 0.88
1   600 0.6 0.66
1   700 0.3 0.33
1   800 0.2 0.22
1   900 0.5 0.55

1 个解决方案

#1


0  

We can use lapply to loop over the list elements from the split output and then order the rows with "C/P" and "K" column values

我们可以使用lapply对来自分割输出的列表元素进行循环,然后使用“C/P”和“K”列值对行进行排序

lapply(split(df, df$Date), function(x) 
            x[order(x[["C/P"]], x[["K"]]),])

Or instead of the split method, any of the group by operations can be done. With data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Date', we order the "C/P" and "K" columns in 'i' and get the Subset of Data.table

或者代替split方法,通过操作可以执行组中的任何一个。与数据。表,我们将“数据。frame”转换为“数据”。表(setDT(df))),按“Date”分组,我们在“i”中订购“C/P”和“K”列,得到Data.table的子集

setDT(df)[order(eval(as.name("C/P")), K), .SD, by = Date]

This may be useful if we are grouping by "Date", order based on columns and doing some operations on the rest of the columns.

如果我们按照“日期”进行分组,根据列进行排序,并对其余列执行一些操作,这可能会很有用。

#1


0  

We can use lapply to loop over the list elements from the split output and then order the rows with "C/P" and "K" column values

我们可以使用lapply对来自分割输出的列表元素进行循环,然后使用“C/P”和“K”列值对行进行排序

lapply(split(df, df$Date), function(x) 
            x[order(x[["C/P"]], x[["K"]]),])

Or instead of the split method, any of the group by operations can be done. With data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Date', we order the "C/P" and "K" columns in 'i' and get the Subset of Data.table

或者代替split方法,通过操作可以执行组中的任何一个。与数据。表,我们将“数据。frame”转换为“数据”。表(setDT(df))),按“Date”分组,我们在“i”中订购“C/P”和“K”列,得到Data.table的子集

setDT(df)[order(eval(as.name("C/P")), K), .SD, by = Date]

This may be useful if we are grouping by "Date", order based on columns and doing some operations on the rest of the columns.

如果我们按照“日期”进行分组,根据列进行排序,并对其余列执行一些操作,这可能会很有用。