如何在r中创建一个类似对比的表

I need help converting a dataframe with certain values into columns that looks like contrasts in R. For example.

我需要帮助将具有特定值的dataframe转换为类似于r中的对比度的列。

code <- data.frame(code = c('R1111', 'R1112', 'R1111', 'R1111', 'R1113', 
                            'R1112', 'R1112', 'R1112', 'R1113', 'R1115'))

I need to convert this to the following table

我需要将它转换为下表

    code   R1111  R1112   R1113   R1115
1  R1111     1      0       0       0
2  R1112     0      1       0       0
3  R1111     2      0       0       0 
4  R1111     3      0       0       0 
5  R1113     0      0       1       0 
6  R1112     0      2       0       0 
7  R1112     0      3       0       0 
8  R1112     0      4       0       0 
9  R1113     0      0       2       0 
10 R1115     0      0       0       1

I have 1400 rows with those sorts of codes that I need to convert. If you notice, each column with the code has increasing number. I tried to do this using reshape2, but i keep getting errors - meaning I haven't been able to figure this out. How can I get this result?

我有1400行这样的代码我需要转换。如果您注意到，每个带有代码的列的数量都在增加。我试着用reshape2来做这个，但我总是会出错——这意味着我还没能算出来。我怎样才能得到这个结果?

5 个解决方案

#1

An option is to use mapply in combination with ifelse to get the desired result as:

一种选择是将mapply与ifelse结合使用，以得到预期的结果如下:

cbind(code,mapply(function(x){
  ifelse(code$code==x,cumsum(code$code==x),0)
}, unique(as.character(code$code))))

#     code R1111 R1112 R1113 R1115
# 1  R1111     1     0     0     0
# 2  R1112     0     1     0     0
# 3  R1111     2     0     0     0
# 4  R1111     3     0     0     0
# 5  R1113     0     0     1     0
# 6  R1112     0     2     0     0
# 7  R1112     0     3     0     0
# 8  R1112     0     4     0     0
# 9  R1113     0     0     2     0
# 10 R1115     0     0     0     1

#2

You can use model.matrix to generate the dummy matrix. Then just multiply it by the number of values.

您可以使用模型。矩阵生成虚拟矩阵。然后把它乘以值的个数。

# calculate indicator using base or data.table, more succinctly
# code$tag = with(code, as.numeric(ave(as.character(code), code, 
#                                  FUN=function(x) cumsum(duplicated(x))+1L)))
code$tag = data.table::rowid(code$code) 

model.matrix(~ 0 + code, data=code)* code$tag
#    codeR1111 codeR1112 codeR1113 codeR1115
# 1          1         0         0         0
# 2          0         1         0         0
# 3          2         0         0         0
# 4          3         0         0         0
# 5          0         0         1         0
# 6          0         2         0         0
# 7          0         3         0         0
# 8          0         4         0         0
# 9          0         0         2         0
# 10         0         0         0         1

#3

A Base R approach (it will throw up some warnings, you can ignore them):

基本的R方法(它会抛出一些警告，您可以忽略它们):

x <- code$code
y <- rep(0, length(x))

DF <- data.frame(x, y, y, y, y)
DF[,2][DF[,1]==unique(x)[1]] <- 1:length(x)
DF[,3][DF[,1]==unique(x)[2]] <- 1:length(x)
DF[,4][DF[,1]==unique(x)[3]] <- 1:length(x)
DF[,5][DF[,1]==unique(x)[4]] <- 1:length(x)

or wrap it in a loop if you've got a lot of columns to handle:

如果你有很多列要处理，也可以将它封装在一个循环中:

DF <- data.frame(x, y, y, y, y)
for(i in 1:4){
  DF[,i+1][DF[,1]==unique(x)[i]] <- 1:length(x)
}

#4

A sapply is capable of doing this: I store code as a vector and do some post-processing to generate the actual data.frame.

sapply可以做到这一点:我将代码存储为向量，并进行一些后处理以生成实际的data.frame。

code <- c("R1111", "R1112", "R1111", "R1111", "R1113", "R1112", "R1112", 
"R1112", "R1113", "R1115")

val <- sapply(sort(unique(code)), function(thiscode) 
  (code==thiscode)*cumsum(code==thiscode)
)

The output is a matrix

输出是一个矩阵

      R1111 R1112 R1113 R1115
 [1,]     1     0     0     0
 [2,]     0     1     0     0
 [3,]     2     0     0     0
 [4,]     3     0     0     0
 [5,]     0     0     1     0
 [6,]     0     2     0     0
 [7,]     0     3     0     0
 [8,]     0     4     0     0
 [9,]     0     0     2     0
[10,]     0     0     0     1

and formatting it thusly gives the desired output.

并对其进行格式化，从而得到所需的输出。

val <- data.frame(code=code, val)

#5

A fairly simple base solution:

一个相当简单的基本解决方案:

m  <- sapply(unique(code$code),'==',code$code)
m2 <- apply(m,2,cumsum)
m2[!m] <- 0
cbind(code,`colnames<-`(m2,unique(code$code)))

#     code R1111 R1112 R1113 R1115
# 1  R1111     1     0     0     0
# 2  R1112     0     1     0     0
# 3  R1111     2     0     0     0
# 4  R1111     3     0     0     0
# 5  R1113     0     0     1     0
# 6  R1112     0     2     0     0
# 7  R1112     0     3     0     0
# 8  R1112     0     4     0     0
# 9  R1113     0     0     2     0
# 10 R1115     0     0     0     1

#1