I am looking for an "elegant" way to basically split a data frame by the levels of one column variable, then create a new output data frame reshaped to now drop the factor variable and add new columns for the levels of the factor variable. I can do this with functions such as the split() method, but this seems to be the messy way to me. I have been trying to do this using the melt() and cast() functions in the plyr package, but haven't been successful in getting the exact output I need.
我正在寻找一种“优雅”的方式来基本上将数据框分割为一个列变量的级别,然后创建一个新的输出数据框,重新设置为现在删除因子变量并为因子变量的级别添加新列。我可以使用split()方法这样的函数来做到这一点,但这对我来说似乎是一种混乱的方式。我一直在尝试使用plyr包中的melt()和cast()函数来做到这一点,但是还没有成功获得我需要的确切输出。
Here is what my data looks like:
这是我的数据:
> jumbo.df = read.csv(...)
> head(jumbo.df)
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 2.875
What I would like to do is split by the variable name, remove Name and Rate, then output columns for Type A, Type B, Type C, Type D, and Type E with the corresponding Rate series with Date as ID:
我想要做的是通过变量名称拆分,删除名称和速率,然后输出类型A,类型B,类型C,类型D和类型E的列,其中相应的费率系列为日期作为ID:
> head(output.df)
PricingDate Type A Type B Type C Type D Type E
2012-03-05 2.875 3.250 3.750 3.750 4.500
2012-03-06 2.875 ...
Thanks!
2 个解决方案
#1
4
Not sure if I get you right, but could it be that you just want to reshape your data into the wide format? If so, you have to use the melt
and cast
functions of the reshape
(!) package. reshape2
is basically the same. Since your data is already in the molten format, i.e. the long format, a one-liner does what you want:
不确定我是否帮助您,但是您是否只想将数据重塑为宽幅格式?如果是这样,您必须使用重塑(!)包的熔化和浇铸功能。 reshape2基本相同。由于您的数据已经处于熔融格式,即长格式,因此单行执行您想要的操作:
df <- read.table(textConnection("PricingDate Name Rate
2012-03-05 TypeA 2.875
2012-03-05 TypeB 3.250
2012-03-05 TypeC 3.750
2012-03-05 TypeD 3.750
2012-03-05 TypeE 4.500
2012-03-06 TypeA 2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
PricingDate TypeA TypeB TypeC TypeD TypeE
1 2012-03-05 2.875 3.25 3.75 3.75 4.5
2 2012-03-06 2.875 NA NA NA NA
#2
1
library(plyr)
library(reshape2)
data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05",
"2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06",
"2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C",
"Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75,
7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186",
"187", "188", "189", "190", "191", "192", "193", "194", "195"
))
> data
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 4.875
192 2012-03-06 Type B 5.250
193 2012-03-06 Type C 6.750
194 2012-03-06 Type D 7.750
195 2012-03-06 Type E 8.500
ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))
PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
1 2012-03-05 2.875 3.25 3.75 3.75
2 2012-03-06 4.875 5.25 6.75 7.75
Rate.Type E
1 4.5
2 8.5
#1
4
Not sure if I get you right, but could it be that you just want to reshape your data into the wide format? If so, you have to use the melt
and cast
functions of the reshape
(!) package. reshape2
is basically the same. Since your data is already in the molten format, i.e. the long format, a one-liner does what you want:
不确定我是否帮助您,但是您是否只想将数据重塑为宽幅格式?如果是这样,您必须使用重塑(!)包的熔化和浇铸功能。 reshape2基本相同。由于您的数据已经处于熔融格式,即长格式,因此单行执行您想要的操作:
df <- read.table(textConnection("PricingDate Name Rate
2012-03-05 TypeA 2.875
2012-03-05 TypeB 3.250
2012-03-05 TypeC 3.750
2012-03-05 TypeD 3.750
2012-03-05 TypeE 4.500
2012-03-06 TypeA 2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
PricingDate TypeA TypeB TypeC TypeD TypeE
1 2012-03-05 2.875 3.25 3.75 3.75 4.5
2 2012-03-06 2.875 NA NA NA NA
#2
1
library(plyr)
library(reshape2)
data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05",
"2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06",
"2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C",
"Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75,
7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186",
"187", "188", "189", "190", "191", "192", "193", "194", "195"
))
> data
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 4.875
192 2012-03-06 Type B 5.250
193 2012-03-06 Type C 6.750
194 2012-03-06 Type D 7.750
195 2012-03-06 Type E 8.500
ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))
PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
1 2012-03-05 2.875 3.25 3.75 3.75
2 2012-03-06 4.875 5.25 6.75 7.75
Rate.Type E
1 4.5
2 8.5