将三列数据帧重构为矩阵(“长”到“宽”格式)

时间:2021-01-19 04:27:28

I have a data.frame that looks like this.

我有一个像这样的数据。

x a 1 
x b 2 
x c 3 
y a 3 
y b 3 
y c 2 

I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:

我想要这个矩阵的形式,这样我就可以把它输入到heatmap来做一个图。结果应该如下所示:

    a    b    c
x   1    2    3
y   3    3    2

I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

我尝试过从重塑包中进行cast,我也尝试过编写一个手动函数来实现这个功能,但是我似乎做不到。

2 个解决方案

#1


151  

There are many ways to do this. This answer starts with my favorite ways, but also collects various ways from answers to similar questions scattered around this site.

有很多方法可以做到这一点。这个答案以我最喜欢的方式开始,但也收集了各种各样的方式,从答案到分散在这个网站上的类似问题。

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
                  y=gl(3,1,6, labels=letters[1:3]), 
                  z=c(1,2,3,3,3,2))

Using reshape2:

使用reshape2:

library(reshape2)
acast(tmp, x~y, value.var="z")

Using matrix indexing:

使用矩阵索引:

with(tmp, {
  out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
                dimnames=list(levels(x), levels(y)))
  out[cbind(x, y)] <- z
  out
})

Using xtabs:

使用xtabs:

xtabs(z~x+y, data=tmp)

You can also use reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

您还可以像这里建议的那样使用“重新组合”:按列名将表转换为矩阵,不过之后您需要做一些操作,以删除额外的列并正确地获取名称(未显示)。

> reshape(tmp, idvar="x", timevar="y", direction="wide")
  x z.a z.b z.c
1 x   1   2   3
4 y   3   3   2

There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names

矩阵包中也有sparseMatrix,如图所示:R -按列名将大表转换为矩阵

> with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
+                        dimnames=list(levels(x), levels(y))))
2 x 3 sparse Matrix of class "dgCMatrix"
  a b c
x 1 2 3
y 3 3 2

The daply function from the plyr library could also be used, as here: https://*.com/a/7020101/210673

plyr库中的daply函数也可以使用,如下所示:https://*.com/a/7020101/210673

> library(plyr)
> daply(tmp, .(x, y), function(x) x$z)
   y
x   a b c
  x 1 2 3
  y 3 3 2

dcast from reshape2 also works, as here: Reshape data for values in one column, but you get a data.frame with a column for the x value.

reshape2的dcast也可以工作,比如:在一个列中重构值的数据,但您可以得到一个包含x值的列的数据。

> dcast(tmp, x~y, value.var="z")
  x a b c
1 x 1 2 3
2 y 3 3 2

Similarly, spread from "tidyr" would also work for such a transformation:

类似地,从“tidyr”传播也可以用于这样的转换:

library(tidyr)
spread(tmp, y, z)
#   x a b c
# 1 x 1 2 3
# 2 y 3 3 2

#2


2  

The question is some years old but maybe some people are still interested in alternative answers.

这个问题已经有些年头了,但也许有些人仍然对其他答案感兴趣。

If you don't want to load any packages, you might use this function:

如果您不想加载任何包,您可以使用以下函数:

#' Converts three columns of a data.frame into a matrix -- e.g. to plot 
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#' 
#' @param data data.frame: input data
#' @param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' @param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' @param datatitle string: name of the column in data, which values should be filled into the output matrix
#' @param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' @param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' @param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' @return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' @author Daniel Neumann
#' @date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle, 
                             rowdecreasing = FALSE, coldecreasing = FALSE,
                             default_value = NA) {

  # check, whether titles exist as columns names in the data.frame data
  if ( (!(rowtitle%in%names(data))) 
       || (!(coltitle%in%names(data))) 
       || (!(datatitle%in%names(data))) ) {
    stop('data.frame2matrix: bad row-, col-, or datatitle.')
  }

  # get number of rows in data
  ndata = dim(data)[1]

  # extract rownames and colnames for the matrix from the data.frame
  rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
  nrows = length(rownames)
  colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
  ncols = length(colnames)

  # initialize the matrix
  out_matrix = matrix(NA, 
                      nrow = nrows, ncol = ncols,
                      dimnames=list(rownames, colnames))

  # iterate rows of data
  for (i1 in 1:ndata) {
    # get matrix-row and matrix-column indices for the current data-row
    iR = which(rownames==data[[rowtitle]][i1])
    iC = which(colnames==data[[coltitle]][i1])

    # throw an error if the matrix entry (iR,iC) is already filled.
    if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
    out_matrix[iR, iC] = data[[datatitle]][i1]
  }

  # set empty matrix entries to the default value
  out_matrix[is.na(out_matrix)] = default_value

  # return matrix
  return(out_matrix)

}

How it works:

它是如何工作的:

myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
                            'dim2'=c('a','b','c','a','b','c'),
                            'values'=c(1,2,3,3,3,2))) 

myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')

myMatrix
>   a b c
> x 1 2 3
> y 3 3 2

#1


151  

There are many ways to do this. This answer starts with my favorite ways, but also collects various ways from answers to similar questions scattered around this site.

有很多方法可以做到这一点。这个答案以我最喜欢的方式开始,但也收集了各种各样的方式,从答案到分散在这个网站上的类似问题。

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
                  y=gl(3,1,6, labels=letters[1:3]), 
                  z=c(1,2,3,3,3,2))

Using reshape2:

使用reshape2:

library(reshape2)
acast(tmp, x~y, value.var="z")

Using matrix indexing:

使用矩阵索引:

with(tmp, {
  out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
                dimnames=list(levels(x), levels(y)))
  out[cbind(x, y)] <- z
  out
})

Using xtabs:

使用xtabs:

xtabs(z~x+y, data=tmp)

You can also use reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

您还可以像这里建议的那样使用“重新组合”:按列名将表转换为矩阵,不过之后您需要做一些操作,以删除额外的列并正确地获取名称(未显示)。

> reshape(tmp, idvar="x", timevar="y", direction="wide")
  x z.a z.b z.c
1 x   1   2   3
4 y   3   3   2

There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names

矩阵包中也有sparseMatrix,如图所示:R -按列名将大表转换为矩阵

> with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
+                        dimnames=list(levels(x), levels(y))))
2 x 3 sparse Matrix of class "dgCMatrix"
  a b c
x 1 2 3
y 3 3 2

The daply function from the plyr library could also be used, as here: https://*.com/a/7020101/210673

plyr库中的daply函数也可以使用,如下所示:https://*.com/a/7020101/210673

> library(plyr)
> daply(tmp, .(x, y), function(x) x$z)
   y
x   a b c
  x 1 2 3
  y 3 3 2

dcast from reshape2 also works, as here: Reshape data for values in one column, but you get a data.frame with a column for the x value.

reshape2的dcast也可以工作,比如:在一个列中重构值的数据,但您可以得到一个包含x值的列的数据。

> dcast(tmp, x~y, value.var="z")
  x a b c
1 x 1 2 3
2 y 3 3 2

Similarly, spread from "tidyr" would also work for such a transformation:

类似地,从“tidyr”传播也可以用于这样的转换:

library(tidyr)
spread(tmp, y, z)
#   x a b c
# 1 x 1 2 3
# 2 y 3 3 2

#2


2  

The question is some years old but maybe some people are still interested in alternative answers.

这个问题已经有些年头了,但也许有些人仍然对其他答案感兴趣。

If you don't want to load any packages, you might use this function:

如果您不想加载任何包,您可以使用以下函数:

#' Converts three columns of a data.frame into a matrix -- e.g. to plot 
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#' 
#' @param data data.frame: input data
#' @param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' @param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' @param datatitle string: name of the column in data, which values should be filled into the output matrix
#' @param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' @param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' @param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' @return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' @author Daniel Neumann
#' @date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle, 
                             rowdecreasing = FALSE, coldecreasing = FALSE,
                             default_value = NA) {

  # check, whether titles exist as columns names in the data.frame data
  if ( (!(rowtitle%in%names(data))) 
       || (!(coltitle%in%names(data))) 
       || (!(datatitle%in%names(data))) ) {
    stop('data.frame2matrix: bad row-, col-, or datatitle.')
  }

  # get number of rows in data
  ndata = dim(data)[1]

  # extract rownames and colnames for the matrix from the data.frame
  rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
  nrows = length(rownames)
  colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
  ncols = length(colnames)

  # initialize the matrix
  out_matrix = matrix(NA, 
                      nrow = nrows, ncol = ncols,
                      dimnames=list(rownames, colnames))

  # iterate rows of data
  for (i1 in 1:ndata) {
    # get matrix-row and matrix-column indices for the current data-row
    iR = which(rownames==data[[rowtitle]][i1])
    iC = which(colnames==data[[coltitle]][i1])

    # throw an error if the matrix entry (iR,iC) is already filled.
    if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
    out_matrix[iR, iC] = data[[datatitle]][i1]
  }

  # set empty matrix entries to the default value
  out_matrix[is.na(out_matrix)] = default_value

  # return matrix
  return(out_matrix)

}

How it works:

它是如何工作的:

myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
                            'dim2'=c('a','b','c','a','b','c'),
                            'values'=c(1,2,3,3,3,2))) 

myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')

myMatrix
>   a b c
> x 1 2 3
> y 3 3 2