如何将一个变量与R [duplicate]上的所有其他变量相关联

This question already has an answer here:

这个问题在这里已有答案:

How can correlate against multiple columns using ddply? 5 answers

如何使用ddply关联多列? 5个答案

I want to correlate one variable (say tyrosine) with all the other variables (about 200 other metabolites, like urea, glucose, inosine, etc) on R, and I'm not sure how to go about it. I'm new to R.

我想将一个变量(比如酪氨酸)和R上的所有其他变量(大约200种其他代谢物,如尿素,葡萄糖,肌苷等)联系起来,我不知道该怎么做。我是R.的新手

I've learned the pair function, but that pairs every metabolite in the range specified to the other.

我已经学会了配对功能,但它将指定范围内的每种代谢物配对到另一种。

Thanks!

3 个解决方案

#1

Since you mention "metabolites", I assume your metric is "concentration", e.g. that you have a matrix, call it data that has one column for every metabolite, and one row for every sample.

既然你提到“代谢物”,我认为你的指标是“浓度”,例如你有一个矩阵,称之为每个代谢物有一列的数据,每个样本都有一行。

So, something like this:

所以,像这样:

# just generates example - YOU SHOULD PROVIDE THIS!!!
data <- data.frame(tyrosine=1:10 + rnorm(10,sd=2), 
                   urea    =2*1:10 + rnorm(10,sd=2),
                   glucose =30 -2*1:10 +rnorm(10,sd=2),
                   inosine =25 -1:10 + rnorm(10,sd=2))
data
     tyrosine      urea  glucose  inosine
1  -0.2529076  5.023562 29.83795 26.71736
2   2.3672866  4.779686 27.56427 22.79442
3   1.3287428  4.757519 24.14913 22.77534
4   7.1905616  3.570600 18.02130 20.89239
5   5.6590155 12.249862 21.23965 17.24588
6   4.3590632 11.910133 17.88774 18.17001
7   7.9748581 13.967619 15.68841 17.21142
8   9.4766494 17.887672 11.05850 16.88137
9  10.1515627 19.642442 11.04370 18.20005
10  9.3892232 21.187803 10.83588 16.52635

To get correlation coefficients, just type:

要获得相关系数,只需键入:

cor(data)
           tyrosine       urea    glucose    inosine
tyrosine  1.0000000  0.8087897 -0.9545523 -0.8512938
urea      0.8087897  1.0000000 -0.8577782 -0.8086910
glucose  -0.9545523 -0.8577782  1.0000000  0.8608000
inosine  -0.8512938 -0.8086910  0.8608000  1.0000000

To generate a scatterplot matrix, just type:

要生成散点图矩阵,只需键入:

pairs(data)

如何将一个变量与R [duplicate]上的所有其他变量相关联

In future, please include an example of your data that can be imported into R.

将来,请包含可导入R的数据示例。

#2

In the following example, I simply split a data frame that contains all of the variables into two matrices. These can be entered into the cor function to obtain your correlation values:

在下面的示例中,我只是将包含所有变量的数据框拆分为两个矩阵。可以将这些输入到cor函数中以获取相关值:

set.seed(1)
n=20
df <- data.frame(tyrosine=runif(n), urea=runif(n), glucose=runif(n), inosine=runif(n))
df

COR <- cor(as.matrix(df[,1]), as.matrix(df[,-1]))
COR
#           urea    glucose    inosine
#[1,] -0.2373854 -0.3672984 -0.3393602

#3

similar to Marc in the box using the apply and column names

与使用apply和列名称的框中的Marc类似

> set.seed(1)
> n=20
> df <- data.frame(tyrosine=runif(n), urea=runif(n), glucose=runif(n), 
  inosine=runif(n))

> apply(df,2, function(col)cor(col, df$tyrosine))

tyrosine       urea    glucose    inosine 
1.0000000 -0.2373854 -0.3672984 -0.3393602

It's a good question, and pattern to know for data of reasonable size, as it's efficient if you only want tyrosine cors (what the OP specifically asked) to only calculate tyrosine cors (n time + space), not all vs all (~n^2 time + space).

对于合理大小的数据来说,这是一个很好的问题和模式,因为如果你只想要酪氨酸(OP特别要求)只计算酪氨酸(n时间+空间),而不是全部和所有(~n),这是有效的^ 2时间+空间)。

#1

Since you mention "metabolites", I assume your metric is "concentration", e.g. that you have a matrix, call it data that has one column for every metabolite, and one row for every sample.

既然你提到“代谢物”,我认为你的指标是“浓度”,例如你有一个矩阵,称之为每个代谢物有一列的数据,每个样本都有一行。

So, something like this:

所以,像这样:

# just generates example - YOU SHOULD PROVIDE THIS!!!
data <- data.frame(tyrosine=1:10 + rnorm(10,sd=2), 
                   urea    =2*1:10 + rnorm(10,sd=2),
                   glucose =30 -2*1:10 +rnorm(10,sd=2),
                   inosine =25 -1:10 + rnorm(10,sd=2))
data
     tyrosine      urea  glucose  inosine
1  -0.2529076  5.023562 29.83795 26.71736
2   2.3672866  4.779686 27.56427 22.79442
3   1.3287428  4.757519 24.14913 22.77534
4   7.1905616  3.570600 18.02130 20.89239
5   5.6590155 12.249862 21.23965 17.24588
6   4.3590632 11.910133 17.88774 18.17001
7   7.9748581 13.967619 15.68841 17.21142
8   9.4766494 17.887672 11.05850 16.88137
9  10.1515627 19.642442 11.04370 18.20005
10  9.3892232 21.187803 10.83588 16.52635

To get correlation coefficients, just type:

要获得相关系数,只需键入:

cor(data)
           tyrosine       urea    glucose    inosine
tyrosine  1.0000000  0.8087897 -0.9545523 -0.8512938
urea      0.8087897  1.0000000 -0.8577782 -0.8086910
glucose  -0.9545523 -0.8577782  1.0000000  0.8608000
inosine  -0.8512938 -0.8086910  0.8608000  1.0000000

To generate a scatterplot matrix, just type:

要生成散点图矩阵,只需键入:

pairs(data)

如何将一个变量与R [duplicate]上的所有其他变量相关联

In future, please include an example of your data that can be imported into R.

将来,请包含可导入R的数据示例。

#2

In the following example, I simply split a data frame that contains all of the variables into two matrices. These can be entered into the cor function to obtain your correlation values:

在下面的示例中,我只是将包含所有变量的数据框拆分为两个矩阵。可以将这些输入到cor函数中以获取相关值:

set.seed(1)
n=20
df <- data.frame(tyrosine=runif(n), urea=runif(n), glucose=runif(n), inosine=runif(n))
df

COR <- cor(as.matrix(df[,1]), as.matrix(df[,-1]))
COR
#           urea    glucose    inosine
#[1,] -0.2373854 -0.3672984 -0.3393602

#3

similar to Marc in the box using the apply and column names

与使用apply和列名称的框中的Marc类似

> set.seed(1)
> n=20
> df <- data.frame(tyrosine=runif(n), urea=runif(n), glucose=runif(n), 
  inosine=runif(n))

> apply(df,2, function(col)cor(col, df$tyrosine))

tyrosine       urea    glucose    inosine 
1.0000000 -0.2373854 -0.3672984 -0.3393602

秒客网

如何将一个变量与R [duplicate]上的所有其他变量相关联

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章