I need help converting a dataframe with certain values into columns that looks like contrasts in R. For example.
我需要帮助将具有特定值的dataframe转换为类似于r中的对比度的列。
code <- data.frame(code = c('R1111', 'R1112', 'R1111', 'R1111', 'R1113',
'R1112', 'R1112', 'R1112', 'R1113', 'R1115'))
I need to convert this to the following table
我需要将它转换为下表
code R1111 R1112 R1113 R1115
1 R1111 1 0 0 0
2 R1112 0 1 0 0
3 R1111 2 0 0 0
4 R1111 3 0 0 0
5 R1113 0 0 1 0
6 R1112 0 2 0 0
7 R1112 0 3 0 0
8 R1112 0 4 0 0
9 R1113 0 0 2 0
10 R1115 0 0 0 1
I have 1400 rows with those sorts of codes that I need to convert. If you notice, each column with the code has increasing number. I tried to do this using reshape2
, but i keep getting errors - meaning I haven't been able to figure this out. How can I get this result?
我有1400行这样的代码我需要转换。如果您注意到,每个带有代码的列的数量都在增加。我试着用reshape2来做这个,但我总是会出错——这意味着我还没能算出来。我怎样才能得到这个结果?
5 个解决方案
#1
2
An option is to use mapply
in combination with ifelse
to get the desired result as:
一种选择是将mapply与ifelse结合使用,以得到预期的结果如下:
cbind(code,mapply(function(x){
ifelse(code$code==x,cumsum(code$code==x),0)
}, unique(as.character(code$code))))
# code R1111 R1112 R1113 R1115
# 1 R1111 1 0 0 0
# 2 R1112 0 1 0 0
# 3 R1111 2 0 0 0
# 4 R1111 3 0 0 0
# 5 R1113 0 0 1 0
# 6 R1112 0 2 0 0
# 7 R1112 0 3 0 0
# 8 R1112 0 4 0 0
# 9 R1113 0 0 2 0
# 10 R1115 0 0 0 1
#2
1
You can use model.matrix
to generate the dummy matrix. Then just multiply it by the number of values.
您可以使用模型。矩阵生成虚拟矩阵。然后把它乘以值的个数。
# calculate indicator using base or data.table, more succinctly
# code$tag = with(code, as.numeric(ave(as.character(code), code,
# FUN=function(x) cumsum(duplicated(x))+1L)))
code$tag = data.table::rowid(code$code)
model.matrix(~ 0 + code, data=code)* code$tag
# codeR1111 codeR1112 codeR1113 codeR1115
# 1 1 0 0 0
# 2 0 1 0 0
# 3 2 0 0 0
# 4 3 0 0 0
# 5 0 0 1 0
# 6 0 2 0 0
# 7 0 3 0 0
# 8 0 4 0 0
# 9 0 0 2 0
# 10 0 0 0 1
#3
0
A Base R approach (it will throw up some warnings, you can ignore them):
基本的R方法(它会抛出一些警告,您可以忽略它们):
x <- code$code
y <- rep(0, length(x))
DF <- data.frame(x, y, y, y, y)
DF[,2][DF[,1]==unique(x)[1]] <- 1:length(x)
DF[,3][DF[,1]==unique(x)[2]] <- 1:length(x)
DF[,4][DF[,1]==unique(x)[3]] <- 1:length(x)
DF[,5][DF[,1]==unique(x)[4]] <- 1:length(x)
or wrap it in a loop if you've got a lot of columns to handle:
如果你有很多列要处理,也可以将它封装在一个循环中:
DF <- data.frame(x, y, y, y, y)
for(i in 1:4){
DF[,i+1][DF[,1]==unique(x)[i]] <- 1:length(x)
}
#4
0
A sapply
is capable of doing this: I store code
as a vector and do some post-processing to generate the actual data.frame
.
sapply可以做到这一点:我将代码存储为向量,并进行一些后处理以生成实际的data.frame。
code <- c("R1111", "R1112", "R1111", "R1111", "R1113", "R1112", "R1112",
"R1112", "R1113", "R1115")
val <- sapply(sort(unique(code)), function(thiscode)
(code==thiscode)*cumsum(code==thiscode)
)
The output is a matrix
输出是一个矩阵
R1111 R1112 R1113 R1115
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 2 0 0 0
[4,] 3 0 0 0
[5,] 0 0 1 0
[6,] 0 2 0 0
[7,] 0 3 0 0
[8,] 0 4 0 0
[9,] 0 0 2 0
[10,] 0 0 0 1
and formatting it thusly gives the desired output.
并对其进行格式化,从而得到所需的输出。
val <- data.frame(code=code, val)
#5
0
A fairly simple base solution:
一个相当简单的基本解决方案:
m <- sapply(unique(code$code),'==',code$code)
m2 <- apply(m,2,cumsum)
m2[!m] <- 0
cbind(code,`colnames<-`(m2,unique(code$code)))
# code R1111 R1112 R1113 R1115
# 1 R1111 1 0 0 0
# 2 R1112 0 1 0 0
# 3 R1111 2 0 0 0
# 4 R1111 3 0 0 0
# 5 R1113 0 0 1 0
# 6 R1112 0 2 0 0
# 7 R1112 0 3 0 0
# 8 R1112 0 4 0 0
# 9 R1113 0 0 2 0
# 10 R1115 0 0 0 1
#1
2
An option is to use mapply
in combination with ifelse
to get the desired result as:
一种选择是将mapply与ifelse结合使用,以得到预期的结果如下:
cbind(code,mapply(function(x){
ifelse(code$code==x,cumsum(code$code==x),0)
}, unique(as.character(code$code))))
# code R1111 R1112 R1113 R1115
# 1 R1111 1 0 0 0
# 2 R1112 0 1 0 0
# 3 R1111 2 0 0 0
# 4 R1111 3 0 0 0
# 5 R1113 0 0 1 0
# 6 R1112 0 2 0 0
# 7 R1112 0 3 0 0
# 8 R1112 0 4 0 0
# 9 R1113 0 0 2 0
# 10 R1115 0 0 0 1
#2
1
You can use model.matrix
to generate the dummy matrix. Then just multiply it by the number of values.
您可以使用模型。矩阵生成虚拟矩阵。然后把它乘以值的个数。
# calculate indicator using base or data.table, more succinctly
# code$tag = with(code, as.numeric(ave(as.character(code), code,
# FUN=function(x) cumsum(duplicated(x))+1L)))
code$tag = data.table::rowid(code$code)
model.matrix(~ 0 + code, data=code)* code$tag
# codeR1111 codeR1112 codeR1113 codeR1115
# 1 1 0 0 0
# 2 0 1 0 0
# 3 2 0 0 0
# 4 3 0 0 0
# 5 0 0 1 0
# 6 0 2 0 0
# 7 0 3 0 0
# 8 0 4 0 0
# 9 0 0 2 0
# 10 0 0 0 1
#3
0
A Base R approach (it will throw up some warnings, you can ignore them):
基本的R方法(它会抛出一些警告,您可以忽略它们):
x <- code$code
y <- rep(0, length(x))
DF <- data.frame(x, y, y, y, y)
DF[,2][DF[,1]==unique(x)[1]] <- 1:length(x)
DF[,3][DF[,1]==unique(x)[2]] <- 1:length(x)
DF[,4][DF[,1]==unique(x)[3]] <- 1:length(x)
DF[,5][DF[,1]==unique(x)[4]] <- 1:length(x)
or wrap it in a loop if you've got a lot of columns to handle:
如果你有很多列要处理,也可以将它封装在一个循环中:
DF <- data.frame(x, y, y, y, y)
for(i in 1:4){
DF[,i+1][DF[,1]==unique(x)[i]] <- 1:length(x)
}
#4
0
A sapply
is capable of doing this: I store code
as a vector and do some post-processing to generate the actual data.frame
.
sapply可以做到这一点:我将代码存储为向量,并进行一些后处理以生成实际的data.frame。
code <- c("R1111", "R1112", "R1111", "R1111", "R1113", "R1112", "R1112",
"R1112", "R1113", "R1115")
val <- sapply(sort(unique(code)), function(thiscode)
(code==thiscode)*cumsum(code==thiscode)
)
The output is a matrix
输出是一个矩阵
R1111 R1112 R1113 R1115
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 2 0 0 0
[4,] 3 0 0 0
[5,] 0 0 1 0
[6,] 0 2 0 0
[7,] 0 3 0 0
[8,] 0 4 0 0
[9,] 0 0 2 0
[10,] 0 0 0 1
and formatting it thusly gives the desired output.
并对其进行格式化,从而得到所需的输出。
val <- data.frame(code=code, val)
#5
0
A fairly simple base solution:
一个相当简单的基本解决方案:
m <- sapply(unique(code$code),'==',code$code)
m2 <- apply(m,2,cumsum)
m2[!m] <- 0
cbind(code,`colnames<-`(m2,unique(code$code)))
# code R1111 R1112 R1113 R1115
# 1 R1111 1 0 0 0
# 2 R1112 0 1 0 0
# 3 R1111 2 0 0 0
# 4 R1111 3 0 0 0
# 5 R1113 0 0 1 0
# 6 R1112 0 2 0 0
# 7 R1112 0 3 0 0
# 8 R1112 0 4 0 0
# 9 R1113 0 0 2 0
# 10 R1115 0 0 0 1