
时间:2022-05-25 04:37:05

I have a correlation matrix data.corr which consists of 163 companies, therefore the matrix dimensions are 163 rows and 163 columns.


I want to sort the highest 5 correlation values of each companny, so I wrote the following code:


COR<- matrix(nrow = 5, ncol = 163)
for(i in 1:163){COR[,i]<-tail(sort(data.corr[i,]),5)}

It works well, but the problem that it doesn't copy the rows and columns name from the original correlation matrix for each value.


the row and column names are the company names. here is a sample of the correlation matrix:


X601288.SS X601988.SS X601998.SS X601818.SS X601939.SS X601398.SS
X601288.SS  1.0000000  0.7628263  0.6130694  0.7947062  0.7578003  0.7568238
X601988.SS  0.7628263  1.0000000  0.7280957  0.6925497  0.8402101  0.8409767
X601998.SS  0.6130694  0.7280957  1.0000000  0.6715793  0.7118446  0.6716997
X601818.SS  0.7947062  0.6925497  0.6715793  1.0000000  0.6825405  0.6471228
X601939.SS  0.7578003  0.8402101  0.7118446  0.6825405  1.0000000  0.8390544
X601398.SS  0.7568238  0.8409767  0.6716997  0.6471228  0.8390544  1.0000000

here a sample of the highest five correlations:


          [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      [,8]
[1,] 0.7568238 0.7280957 0.6894561 0.6715793 0.7558052 0.7323083 0.7323083 0.6894561
[2,] 0.7578003 0.7628263 0.7118446 0.6825405 0.7578003 0.7568238 0.7472125 0.6956420
[3,] 0.7628263 0.8402101 0.7223088 0.6925497 0.8390544 0.8390544 0.7558052 0.7007705
[4,] 0.7947062 0.8409767 0.7280957 0.7947062 0.8402101 0.8409767 0.7618053 0.7618053
[5,] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000

any ideas how to sort the values with their matching column and row names?


2 个解决方案



Something like this should simplify your code and work for you:


# Toy data
data.corr <- cor(matrix(rnorm(200), 20, 10))
rownames(data.corr) <- colnames(data.corr) <- paste0("company", 1:10)

# Get highest correlations for each company
COR <- apply(data.corr, 2, sort, decreasing = TRUE)[1:5 + 1, ]

# Get corresponding rows / companies
COR_comp <- apply(data.corr, 2, order, decreasing = TRUE)[1:5 + 1, ]

If you insist that it is the names that appear (and not numbers) that appear in COR_comp, you can modify it further. For example, the following will fill in the company names:


COR_comp[]  <- colnames(COR_comp )[c(COR_comp )]



Here is a way to do it with data.table and dplyr


# example data
data <- matrix(rnorm(1000), nrow = 20) %>% data.frame
data.corr <- cor(data)

melted_data <- data.corr %>% 
  data.frame %>% # convert to a data.frame
  mutate(row_company = row.names(.)) %>% # create a new variable from col.name
  melt(id.vars = "row_company") # change the data shape from wide to long
setDT(melted_data) # crate a data.table

# sort the data first by compnay and decending order of the correlation)
# when select top 5 in each company
final_output <- melted_data[order(variable, -value)][, head(.SD, 5), by=variable]
head(final_output, 10)



Something like this should simplify your code and work for you:


# Toy data
data.corr <- cor(matrix(rnorm(200), 20, 10))
rownames(data.corr) <- colnames(data.corr) <- paste0("company", 1:10)

# Get highest correlations for each company
COR <- apply(data.corr, 2, sort, decreasing = TRUE)[1:5 + 1, ]

# Get corresponding rows / companies
COR_comp <- apply(data.corr, 2, order, decreasing = TRUE)[1:5 + 1, ]

If you insist that it is the names that appear (and not numbers) that appear in COR_comp, you can modify it further. For example, the following will fill in the company names:


COR_comp[]  <- colnames(COR_comp )[c(COR_comp )]



Here is a way to do it with data.table and dplyr


# example data
data <- matrix(rnorm(1000), nrow = 20) %>% data.frame
data.corr <- cor(data)

melted_data <- data.corr %>% 
  data.frame %>% # convert to a data.frame
  mutate(row_company = row.names(.)) %>% # create a new variable from col.name
  melt(id.vars = "row_company") # change the data shape from wide to long
setDT(melted_data) # crate a data.table

# sort the data first by compnay and decending order of the correlation)
# when select top 5 in each company
final_output <- melted_data[order(variable, -value)][, head(.SD, 5), by=variable]
head(final_output, 10)