如何使用R中的plyr包创建与给定列的级别对应的新数据框列？

I am looking for an "elegant" way to basically split a data frame by the levels of one column variable, then create a new output data frame reshaped to now drop the factor variable and add new columns for the levels of the factor variable. I can do this with functions such as the split() method, but this seems to be the messy way to me. I have been trying to do this using the melt() and cast() functions in the plyr package, but haven't been successful in getting the exact output I need.

我正在寻找一种“优雅”的方式来基本上将数据框分割为一个列变量的级别,然后创建一个新的输出数据框,重新设置为现在删除因子变量并为因子变量的级别添加新列。我可以使用split()方法这样的函数来做到这一点,但这对我来说似乎是一种混乱的方式。我一直在尝试使用plyr包中的melt()和cast()函数来做到这一点,但是还没有成功获得我需要的确切输出。

Here is what my data looks like:

这是我的数据:

> jumbo.df = read.csv(...)
> head(jumbo.df)
         PricingDate  Name     Rate 
    186  2012-03-05   Type A   2.875  
    187  2012-03-05   Type B   3.250  
    188  2012-03-05   Type C   3.750  
    189  2012-03-05   Type D   3.750  
    190  2012-03-05   Type E   4.500  
    191  2012-03-06   Type A   2.875

What I would like to do is split by the variable name, remove Name and Rate, then output columns for Type A, Type B, Type C, Type D, and Type E with the corresponding Rate series with Date as ID:

我想要做的是通过变量名称拆分,删除名称和速率,然后输出类型A,类型B,类型C,类型D和类型E的列,其中相应的费率系列为日期作为ID:

> head(output.df)
         PricingDate  Type A   Type B    Type C    Type D    Type E 
         2012-03-05    2.875    3.250     3.750     3.750     4.500  
         2012-03-06    2.875    ...

Thanks!

2 个解决方案

#1

Not sure if I get you right, but could it be that you just want to reshape your data into the wide format? If so, you have to use the melt and cast functions of the reshape (!) package. reshape2 is basically the same. Since your data is already in the molten format, i.e. the long format, a one-liner does what you want:

不确定我是否帮助您,但是您是否只想将数据重塑为宽幅格式?如果是这样,您必须使用重塑(!)包的熔化和浇铸功能。 reshape2基本相同。由于您的数据已经处于熔融格式,即长格式,因此单行执行您想要的操作:

df <- read.table(textConnection("PricingDate  Name     Rate 
                                 2012-03-05   TypeA   2.875  
                                 2012-03-05   TypeB   3.250  
                                 2012-03-05   TypeC   3.750  
                                 2012-03-05   TypeD   3.750  
                                 2012-03-05   TypeE   4.500  
                                 2012-03-06   TypeA   2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
  PricingDate TypeA TypeB TypeC TypeD TypeE
1  2012-03-05 2.875  3.25  3.75  3.75   4.5
2  2012-03-06 2.875    NA    NA    NA    NA

#2

library(plyr)
library(reshape2)

    data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05", 
    "2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06", 
    "2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C", 
    "Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
    ), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75, 
    7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186", 
    "187", "188", "189", "190", "191", "192", "193", "194", "195"
    ))


    > data
        PricingDate   Name  Rate
    186  2012-03-05 Type A 2.875
    187  2012-03-05 Type B 3.250
    188  2012-03-05 Type C 3.750
    189  2012-03-05 Type D 3.750
    190  2012-03-05 Type E 4.500
    191  2012-03-06 Type A 4.875
    192  2012-03-06 Type B 5.250
    193  2012-03-06 Type C 6.750
    194  2012-03-06 Type D 7.750
    195  2012-03-06 Type E 8.500


    ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))



    PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
    1  2012-03-05       2.875        3.25        3.75        3.75
    2  2012-03-06       4.875        5.25        6.75        7.75
      Rate.Type E
    1         4.5
    2         8.5

#1

df <- read.table(textConnection("PricingDate  Name     Rate 
                                 2012-03-05   TypeA   2.875  
                                 2012-03-05   TypeB   3.250  
                                 2012-03-05   TypeC   3.750  
                                 2012-03-05   TypeD   3.750  
                                 2012-03-05   TypeE   4.500  
                                 2012-03-06   TypeA   2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
  PricingDate TypeA TypeB TypeC TypeD TypeE
1  2012-03-05 2.875  3.25  3.75  3.75   4.5
2  2012-03-06 2.875    NA    NA    NA    NA

#2

library(plyr)
library(reshape2)

    data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05", 
    "2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06", 
    "2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C", 
    "Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
    ), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75, 
    7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186", 
    "187", "188", "189", "190", "191", "192", "193", "194", "195"
    ))


    > data
        PricingDate   Name  Rate
    186  2012-03-05 Type A 2.875
    187  2012-03-05 Type B 3.250
    188  2012-03-05 Type C 3.750
    189  2012-03-05 Type D 3.750
    190  2012-03-05 Type E 4.500
    191  2012-03-06 Type A 4.875
    192  2012-03-06 Type B 5.250
    193  2012-03-06 Type C 6.750
    194  2012-03-06 Type D 7.750
    195  2012-03-06 Type E 8.500


    ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))



    PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
    1  2012-03-05       2.875        3.25        3.75        3.75
    2  2012-03-06       4.875        5.25        6.75        7.75
      Rate.Type E
    1         4.5
    2         8.5

秒客网

如何使用R中的plyr包创建与给定列的级别对应的新数据框列？

2 个解决方案

#1

#2

#1

#2

相关文章