I am reading my data from a csv file. I want to sum over rows of the read data, then I want to sort them on the basis of rowsum
values. Now, I want to select number of rows on the basis of specified threshold on rowsum
value. I gave a try on tempdata.csv
, which contains following data:
我正在从csv文件中读取数据。我想总结读取数据的行,然后我想根据rowsum值对它们进行排序。现在,我想根据rowsum值的指定阈值选择行数。我试了一下tempdata.csv,其中包含以下数据:
>data <- read.csv("tempdata.csv")
>data
X Doc1 Doc2 Doc3 Doc4
1 book 2 0 2 1
2 table 0 2 0 1
3 room 0 2 0 0
4 chair 0 0 2 0
5 speaker 0 0 0 0
>m <- data.matrix(data[2:length(data)], rownames.force=NA)
>(dimnames(m)[[1]] <- data[,1])
>rs1 <- rowSums(m, na.rm = FALSE)
Now I don't know how to combine rowsum values to the matrix 'm'. I am very new in R, I am not able write the optimized code to achieve this. Please help me, thanks in advance.
现在我不知道如何将rowum值组合到矩阵'm'。我是R的新手,我无法编写优化的代码来实现这一目标。请帮助我,提前谢谢。
2 个解决方案
#1
1
This will sort the data.frame
or data.matrix
by rowSums
这将按rowSums对data.frame或data.matrix进行排序
m[sort(rowSums(m), index=T, decreasing=TRUE)$ix, ]
If you only want the rows that meet a threshold you don't need to sort
如果您只想要符合阈值的行,则无需进行排序
m[rowSums(m) > threshold, ]
If you want to add a column containing the rowSum
values
如果要添加包含rowSum值的列
m <- cbind(m, rowSums(m))
#2
0
Thank you @6pool for your answer. I used following code to achieve the goal.
谢谢@ 6pool的回答。我使用以下代码来实现目标。
data <- read.csv("tiny.csv")
data2 <- data[, 2:length(data)]
data2 <- transform(data2, sum=rowSums(data2))
(dimnames(data2)[[1]] <- data[,1])
data3 <- data2[order(-data2$sum),]
### specify the threshold to select the number of rows
threshold = 3
(data4 <- data3[data3$sum>= threshold, ])
#1
1
This will sort the data.frame
or data.matrix
by rowSums
这将按rowSums对data.frame或data.matrix进行排序
m[sort(rowSums(m), index=T, decreasing=TRUE)$ix, ]
If you only want the rows that meet a threshold you don't need to sort
如果您只想要符合阈值的行,则无需进行排序
m[rowSums(m) > threshold, ]
If you want to add a column containing the rowSum
values
如果要添加包含rowSum值的列
m <- cbind(m, rowSums(m))
#2
0
Thank you @6pool for your answer. I used following code to achieve the goal.
谢谢@ 6pool的回答。我使用以下代码来实现目标。
data <- read.csv("tiny.csv")
data2 <- data[, 2:length(data)]
data2 <- transform(data2, sum=rowSums(data2))
(dimnames(data2)[[1]] <- data[,1])
data3 <- data2[order(-data2$sum),]
### specify the threshold to select the number of rows
threshold = 3
(data4 <- data3[data3$sum>= threshold, ])