I have a data frame that contains options information for each date. Each date has multiple rows corresponding to a changing range of strike prices:
我有一个数据框架,其中包含每个日期的选项信息。每个日期都有多个行,对应于变动的执行价格范围:
head(df)
Date C/P K Vol Delta ID
1 01/23/1997 0 805 0.155814 0.234181 10007288
2 01/23/1997 1 790 0.159603 -0.609276 10333499
3 01/23/1997 0 815 0.141776 0.132414 10106825
4 01/23/1997 1 700 0.257233 -0.060976 10012499
5 01/23/1997 1 680 0.279465 -0.035616 10072595
6 01/23/1997 0 730 0.197782 0.888286 10307920
I have 216 dates, and each date has 100-300 rows, one for each strike price. I want to split the data frame by date, and for each date frame, use C/P as primary sort key and K as secondary sort key.
我有216个日期,每个日期有100-300行,每个执行价格一个。我想按日期分割数据帧,对于每个日期帧,使用C/P作为主排序键,K作为辅助排序键。
Is plyr the package to use? I've tried split(df, df$Date) but I cannot find any documentation on applying a sorting function to each split data frame.
plyr是要用的包装吗?我尝试过拆分(df, df$Date),但我找不到任何关于将排序函数应用于每个拆分数据框架的文档。
By primary and secondary sort, I mean:
我说的初级和中级排序是指:
Input:
C/P K Vol Delta
0 800 0.1 0.11
1 800 0.2 0.22
1 700 0.3 0.33
0 700 0.4 0.44
1 900 0.5 0.55
1 600 0.6 0.66
0 600 0.7 0.77
0 900 0.8 0.88
Output:
C/P K Vol Delta
0 600 0.7 0.77
0 700 0.4 0.44
0 800 0.1 0.11
0 900 0.8 0.88
1 600 0.6 0.66
1 700 0.3 0.33
1 800 0.2 0.22
1 900 0.5 0.55
1 个解决方案
#1
0
We can use lapply
to loop over the list
elements from the split
output and then order
the rows with "C/P" and "K" column values
我们可以使用lapply对来自分割输出的列表元素进行循环,然后使用“C/P”和“K”列值对行进行排序
lapply(split(df, df$Date), function(x)
x[order(x[["C/P"]], x[["K"]]),])
Or instead of the split
method, any of the group by operations can be done. With data.table
, we convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'Date', we order
the "C/P" and "K" columns in 'i' and get the Subset of Data.table
或者代替split方法,通过操作可以执行组中的任何一个。与数据。表,我们将“数据。frame”转换为“数据”。表(setDT(df))),按“Date”分组,我们在“i”中订购“C/P”和“K”列,得到Data.table的子集
setDT(df)[order(eval(as.name("C/P")), K), .SD, by = Date]
This may be useful if we are grouping by "Date", order
based on columns and doing some operations on the rest of the columns.
如果我们按照“日期”进行分组,根据列进行排序,并对其余列执行一些操作,这可能会很有用。
#1
0
We can use lapply
to loop over the list
elements from the split
output and then order
the rows with "C/P" and "K" column values
我们可以使用lapply对来自分割输出的列表元素进行循环,然后使用“C/P”和“K”列值对行进行排序
lapply(split(df, df$Date), function(x)
x[order(x[["C/P"]], x[["K"]]),])
Or instead of the split
method, any of the group by operations can be done. With data.table
, we convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'Date', we order
the "C/P" and "K" columns in 'i' and get the Subset of Data.table
或者代替split方法,通过操作可以执行组中的任何一个。与数据。表,我们将“数据。frame”转换为“数据”。表(setDT(df))),按“Date”分组,我们在“i”中订购“C/P”和“K”列,得到Data.table的子集
setDT(df)[order(eval(as.name("C/P")), K), .SD, by = Date]
This may be useful if we are grouping by "Date", order
based on columns and doing some operations on the rest of the columns.
如果我们按照“日期”进行分组,根据列进行排序,并对其余列执行一些操作,这可能会很有用。