跨列子集的行循环

I have a data frame with column 1 being the gene and all other columns being gene expression data for that gene under different conditions. I want to go gene by gene and divide all the expression values by the median expression value for that gene. I have the medians in a data frame called s.med.df.

我有一个数据框架，第一列是基因，其他列是不同条件下该基因的基因表达数据。我想逐个基因然后把所有的表达值除以这个基因的中值。我在一个叫做s.med.df的数据框架中有中位数。

I’m trying to direct R to divide all the expression columns (2:n) but not the first column by the median value for each gene. I'm new to R, but the script I have so far is as follows:

我试图用R来划分所有的表达列(2:n)但不是第一列除以每个基因的中值。我是R的新手，但是到目前为止我的剧本是这样的:

Con1 <- c(5088.77, 274.62, 251.97, 122.21)
Con2 <- c(4382.59, 288.55, 208.12, 171.93)
Con3 <- c(4732.81, 417.43, 305.58, 132.93)
Solid.df <- data.frame(Gene = c("A", "B", "C", "D"), Con1=Con1, Con2=Con2, Con3=Con3)

Gene    Con1     Con2     Con3
A       5088.77  4382.59  4732.81
B       274.62   288.55   417.43
C       251.97   208.12   305.58
D       122.21   171.93   132.93

n <- ncol(Solid.df)
genes = levels(s.med.df$Gene)
Solid.mt.df = Solid.df
for (i in 1:length(genes)) {
  gene = genes[i]
  Solid.mt.df[2:n][Solid.mt.df$Gene == gene] = Solid.mt.df[2:n][Solid.mt.df$Gene == gene] / s.med.df$Medians[i]
  print(gene)
}

Thank you in advance

提前谢谢你

3 个解决方案

#1

This can be achieved by direct divide. Change s.med.df to a vector. See the following example.

这可以通过直接划分来实现。改变小。df向量。看下面的例子。

d1 <- data.frame(ge=c("A", "B", "C"), e1=1:3, e2=7:9,
                 stringsAsFactors = FALSE)
m1 <- data.frame(md=4:6, stringsAsFactors = FALSE)

d1[,2:3]/unlist(m1)
#     e1   e2
# 1 0.25 1.75
# 2 0.40 1.60
# 3 0.50 1.50

Can also bind the gene names with the results.

也可以将基因名与结果结合。

cbind(d1[,1], d1[,2:3]/unlist(m1))

#2

For anything to do with applying a function over columns or rows, you're looking for apply:

对于任何与在列或行上应用函数有关的东西，您都在寻找应用:

median_centered <- t(apply(genes[,2:length(genes)], 1, function(x) x / median(x)))
genes2 <- cbind(genes[,1], median_centered)

This takes the data frame except for the first column, iterates over the 1st axis (rows), and applies x / median(x) to those rows. Since R broadcasts scalar operations to vectors, you'll get the desired result, but it will be transposed, so calling t() on it turns it back into the original format. Then we can cbind it back with the gene names.

除了第一列之外，它获取数据帧，遍历第一个轴(行)，并对这些行应用x /中值(x)。由于R向向量广播标量操作，您将得到所需的结果，但它将被转置，因此调用t()将它转换为原始格式。然后我们可以把它和基因名结合起来。

#3

like @VenYao pointed out, you can use direct division if you turn your medians into a vector. It would be helpful to show what structure is your s.med.df file.

就像@VenYao指出的那样，如果你把你的中位数变成一个向量，你就可以使用直接除法。这将有助于说明你的s.med是什么结构。df文件。

This can be achieved using data.table pretty easily:

这可以通过数据来实现。表很容易:

cbind your dataframes into a data.table:

将您的数据aframes绑定到一个数据中。

library(data.table)
combined <- data.table(cbind(Solid.df, s.med.df))
combined[, med.con1 := Con1/median]  
# assume median is the column in s.med.df that stores median values. 
# then you can repeat that for all three conditions:
combined[, med.con2 := Con2/median] 
combined[, med.con2 := Con2/median]

#1