I have a correlation matrix data.corr
which consists of 163 companies, therefore the matrix dimensions are 163 rows and 163 columns.
我有一个相关矩阵data.corr,由163家公司组成,因此矩阵维度为163行和163列。
I want to sort the highest 5 correlation values of each companny, so I wrote the following code:
我想对每个公司的最高5个相关值进行排序,因此我编写了以下代码:
COR<- matrix(nrow = 5, ncol = 163)
for(i in 1:163){COR[,i]<-tail(sort(data.corr[i,]),5)}
It works well, but the problem that it doesn't copy the rows and columns name from the original correlation matrix for each value.
它运行良好,但问题是它不会从每个值的原始相关矩阵中复制行和列名称。
the row and column names are the company names. here is a sample of the correlation matrix:
行名和列名是公司名称。这是相关矩阵的样本:
head(data.corr)
X601288.SS X601988.SS X601998.SS X601818.SS X601939.SS X601398.SS
X601288.SS 1.0000000 0.7628263 0.6130694 0.7947062 0.7578003 0.7568238
X601988.SS 0.7628263 1.0000000 0.7280957 0.6925497 0.8402101 0.8409767
X601998.SS 0.6130694 0.7280957 1.0000000 0.6715793 0.7118446 0.6716997
X601818.SS 0.7947062 0.6925497 0.6715793 1.0000000 0.6825405 0.6471228
X601939.SS 0.7578003 0.8402101 0.7118446 0.6825405 1.0000000 0.8390544
X601398.SS 0.7568238 0.8409767 0.6716997 0.6471228 0.8390544 1.0000000
here a sample of the highest five correlations:
这里是最高五个相关性的样本:
head(COR)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.7568238 0.7280957 0.6894561 0.6715793 0.7558052 0.7323083 0.7323083 0.6894561
[2,] 0.7578003 0.7628263 0.7118446 0.6825405 0.7578003 0.7568238 0.7472125 0.6956420
[3,] 0.7628263 0.8402101 0.7223088 0.6925497 0.8390544 0.8390544 0.7558052 0.7007705
[4,] 0.7947062 0.8409767 0.7280957 0.7947062 0.8402101 0.8409767 0.7618053 0.7618053
[5,] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
any ideas how to sort the values with their matching column and row names?
任何想法如何使用匹配的列和行名称对值进行排序?
2 个解决方案
#1
1
Something like this should simplify your code and work for you:
这样的事情应该简化你的代码并为你工作:
# Toy data
data.corr <- cor(matrix(rnorm(200), 20, 10))
rownames(data.corr) <- colnames(data.corr) <- paste0("company", 1:10)
print(data.corr)
# Get highest correlations for each company
COR <- apply(data.corr, 2, sort, decreasing = TRUE)[1:5 + 1, ]
# Get corresponding rows / companies
COR_comp <- apply(data.corr, 2, order, decreasing = TRUE)[1:5 + 1, ]
If you insist that it is the names that appear (and not numbers) that appear in COR_comp
, you can modify it further. For example, the following will fill in the company names:
如果您坚持认为它是出现在COR_comp中的名称(而不是数字),您可以进一步修改它。例如,以下内容将填写公司名称:
COR_comp[] <- colnames(COR_comp )[c(COR_comp )]
#2
1
Here is a way to do it with data.table
and dplyr
这是使用data.table和dplyr完成此操作的方法
require(data.table)
require(dplyr)
# example data
data <- matrix(rnorm(1000), nrow = 20) %>% data.frame
data.corr <- cor(data)
melted_data <- data.corr %>%
data.frame %>% # convert to a data.frame
mutate(row_company = row.names(.)) %>% # create a new variable from col.name
melt(id.vars = "row_company") # change the data shape from wide to long
setDT(melted_data) # crate a data.table
# sort the data first by compnay and decending order of the correlation)
# when select top 5 in each company
final_output <- melted_data[order(variable, -value)][, head(.SD, 5), by=variable]
head(final_output, 10)
#1
1
Something like this should simplify your code and work for you:
这样的事情应该简化你的代码并为你工作:
# Toy data
data.corr <- cor(matrix(rnorm(200), 20, 10))
rownames(data.corr) <- colnames(data.corr) <- paste0("company", 1:10)
print(data.corr)
# Get highest correlations for each company
COR <- apply(data.corr, 2, sort, decreasing = TRUE)[1:5 + 1, ]
# Get corresponding rows / companies
COR_comp <- apply(data.corr, 2, order, decreasing = TRUE)[1:5 + 1, ]
If you insist that it is the names that appear (and not numbers) that appear in COR_comp
, you can modify it further. For example, the following will fill in the company names:
如果您坚持认为它是出现在COR_comp中的名称(而不是数字),您可以进一步修改它。例如,以下内容将填写公司名称:
COR_comp[] <- colnames(COR_comp )[c(COR_comp )]
#2
1
Here is a way to do it with data.table
and dplyr
这是使用data.table和dplyr完成此操作的方法
require(data.table)
require(dplyr)
# example data
data <- matrix(rnorm(1000), nrow = 20) %>% data.frame
data.corr <- cor(data)
melted_data <- data.corr %>%
data.frame %>% # convert to a data.frame
mutate(row_company = row.names(.)) %>% # create a new variable from col.name
melt(id.vars = "row_company") # change the data shape from wide to long
setDT(melted_data) # crate a data.table
# sort the data first by compnay and decending order of the correlation)
# when select top 5 in each company
final_output <- melted_data[order(variable, -value)][, head(.SD, 5), by=variable]
head(final_output, 10)