I have a the following dataset :
我有以下数据集:
> head(Dataset)
IDCLIENT NUMDOSS PROV_200112 PROV_200212 PROV_200312 ....
H44879 0104957 0 0 0 ....
Z353025 0105289 0 0 0 ....
B416771 0105309 0 0 0 ....
B5043 0105366 0 0 0 ....
A725471 0105370 0 0 0 ....
BE406627 0105371 0 0 0 ....
This a table of 125 columns and 250 000 obs
这是一个由125列和25万obs组成的表格
We have a table with two ID columns ID1, ID2 and the other columns register the values of a variables called Prov in a certain period of time from 2003 to 2017.
我们有一个表,其中有两个ID列ID1、ID2和其他列,它们在2003年至2017年的某一段时间内记录一个名为Prov的变量的值。
What I am trying to do is to transform this table so that I will have only 4 columns ( ID1, ID2, Date(or period) and Prov ) instead of 125 ones. Something like this :
我要做的是转换这个表,这样我将只有4列(ID1、ID2、Date(或period)和Prov),而不是125列。是这样的:
>head(DF)
IDClient IDDossier Date Prov
B416771 0104957 PROV_200110 5
B416771 0104957 PROV_200111 0
B416771 0104957 PROV_200112 99
B416771 0104957 PROV_200212 1,23
H44879 0105289 PROV_200212 36,1
Z353025 0105309 PROV_200312 10436,175
BE4410 0105366 PROV_200406 10438,9
to do so I tried the following code, but it's really time consuming and I also got some errors/ warnings message from R:
为此,我尝试了下面的代码,但这真的很耗时,而且我还收到了一些来自R的错误/警告消息:
# N= dim(ProvTab)[1]*dim(ProvTab)[2]
# DF=data.frame(IDClient=rep("",N),IDDossier=rep("",N),
# Date=rep("",N), Prov=rep("",N), stringsAsFactors = F)
DF=NULL
periodd=as.data.frame(colnames(ProvTab))
start.time= Sys.time() # to count running time !
for (k in 3:ncol(ProvTab))
{
for (j in 1:nrow(ProvTab))
{
DF=rbind(DF,
data.frame(IDClient=ProvTab[j,1], IDDossier=ProvTab[j,2],
Date = periodd[k,1] , Prov=ProvTab[j,k]))
}
}
end.time= Sys.time()
end.time- start.time
I really tried to look for other solutions but failed.
我真的试图寻找其他的解决方法,但失败了。
PS: R or SQL codes are welcomed !
欢迎输入R或SQL代码!
Any suggestions ?
有什么建议吗?
1 个解决方案
#1
2
I think this is a pretty simple wide to long transformation. The reshape2
package is great for this.
我认为这是一个相当简单的宽变换到长变换。reshape2包很适合这个。
require(reshape2)
DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))
names(DF) <- c("IDClient", "IDDossier", "Date", "Prov")
If that's slow, you can try with data.table
instead which is probably faster.
如果很慢,你可以尝试数据。而表格可能更快。
require(data.table)
setDT(Dataset)
DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))
#1
2
I think this is a pretty simple wide to long transformation. The reshape2
package is great for this.
我认为这是一个相当简单的宽变换到长变换。reshape2包很适合这个。
require(reshape2)
DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))
names(DF) <- c("IDClient", "IDDossier", "Date", "Prov")
If that's slow, you can try with data.table
instead which is probably faster.
如果很慢,你可以尝试数据。而表格可能更快。
require(data.table)
setDT(Dataset)
DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))