Possible Duplicate:
How to sort a dataframe by column(s) in R可能重复:如何按R中的列对数据帧进行排序
I was just wondering if some one could help me out, I have what I thought should be a easy problem to solve.
我只是想知道是否有人可以帮助我,我有我认为应该是一个容易解决的问题。
I have the table below:
我有下表:
SampleID Cluster
R0132F041p 1
R0132F127 1
R0132F064 1
R0132F068p 1
R0132F015 2
R0132F094 3
R0132F105 1
R0132F013 2
R0132F114 1
R0132F014 2
R0132F039p 3
R0132F137 1
R0132F059 1
R0132F138p 2
R0132F038p 2
and I would like to sort/order it by Cluster to get the results as below:
我想通过Cluster对其进行排序/排序,以获得如下结果:
SampleID Cluster
R0132F041p 1
R0132F127 1
R0132F064 1
R0132F068p 1
R0132F105 1
R0132F114 1
R0132F137 1
R0132F059 1
R0132F015 2
R0132F013 2
R0132F014 2
R0132F138p 2
R0132F038p 2
R0132F094 3
R0132F039p 3
I have tried the following R code:
我试过以下R代码:
data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')
data <- data.frame(data)
data <- data[order(data$Cluster),]
write.table(data, file = 'OrderedTable.txt', append = TRUE,quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE, col.names = FALSE)
and get the following output:
并获得以下输出:
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
10 2
11 2
12 2
13 2
14 3
15 3
Why have the SampleIDs been replaced by the numbers 1-15 and what do these numbers represent, I have read the ?order()
page however this seems to explain sort.list better than order() if any one could help me out on this I would be very grateful.
为什么SampleIDs被数字1-15取代,这些数字代表什么,我已经阅读了?order()页面但是这似乎比order()更好地解释了sort.list,如果任何人可以帮我解决这个问题我会很感激。
2 个解决方案
#1
10
The short answer is you did it perfectly. You just are having some difficulty with reading and writing files. Going through your code:
简短的回答是你做得很好。您只是在阅读和编写文件时遇到一些困难。浏览你的代码:
data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')
The above line is reading in your data fine, but the row.names=1
told it to use the first column as names for rows. So now your SampleIDs are row names instead of being their own column. If you type data
or head(data)
or str(data)
immediately after running this line, this should be clear. Just omit that row.names argument and it will read properly.
上面的行正在读取您的数据,但是row.names = 1告诉它使用第一列作为行的名称。所以现在你的SampleID是行名而不是他们自己的列。如果在运行此行后立即键入数据或头(数据)或str(数据),则应该清楚。只要省略row.names参数,它就会正确读取。
data <- data.frame(data)
You don't need this above line because read.table()
produces a dataframe. You can see that with str(data)
as well.
您不需要以上行,因为read.table()会生成一个数据帧。你也可以看到str(data)。
data <- data[order(data$Cluster),]
The above line is perfect.
以上这条线很完美。
write.table(data, file = 'OrderedTable.txt', append = TRUE,
quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE,
col.names = FALSE)
Here you included the argument col.names = FALSE
which is why your file doesn't have column names. You also don't need/want append=TRUE
. If you look at help(write.table)
, you see it is "only relevant if file is a character string". Here it seems to make the file write without ending the last line, which would likely cause any later read.table()
to complain.
这里你包含了参数col.names = FALSE,这就是你的文件没有列名的原因。你也不需要/想要append = TRUE。如果你看一下help(write.table),你会发现它“仅在文件是字符串时才相关”。这里似乎使文件写入而不结束最后一行,这可能会导致任何后来的read.table()抱怨。
The numbers 1-15 in your result look like row numbers. You don't explain how you look at the resulting file, so I cannot be sure. You likely read your file in a way that doesn't parse the row.names and is showing row numbers instead. If you make certain your SampleIDs column does not get assigned to be names of rows, you'll probably be fine.
结果中的数字1-15看起来像行号。您没有解释如何查看生成的文件,因此我无法确定。您可能以不解析row.names的方式读取文件,而是显示行号。如果您确定您的SampleIDs列未被指定为行的名称,那么您可能会没问题。
#2
5
Have a look at the arrange
function of the plyr
package.
看看plyr包的排列功能。
arrange(data, Cluster)
write.table(data, "ordered_data.txt")
#1
10
The short answer is you did it perfectly. You just are having some difficulty with reading and writing files. Going through your code:
简短的回答是你做得很好。您只是在阅读和编写文件时遇到一些困难。浏览你的代码:
data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')
The above line is reading in your data fine, but the row.names=1
told it to use the first column as names for rows. So now your SampleIDs are row names instead of being their own column. If you type data
or head(data)
or str(data)
immediately after running this line, this should be clear. Just omit that row.names argument and it will read properly.
上面的行正在读取您的数据,但是row.names = 1告诉它使用第一列作为行的名称。所以现在你的SampleID是行名而不是他们自己的列。如果在运行此行后立即键入数据或头(数据)或str(数据),则应该清楚。只要省略row.names参数,它就会正确读取。
data <- data.frame(data)
You don't need this above line because read.table()
produces a dataframe. You can see that with str(data)
as well.
您不需要以上行,因为read.table()会生成一个数据帧。你也可以看到str(data)。
data <- data[order(data$Cluster),]
The above line is perfect.
以上这条线很完美。
write.table(data, file = 'OrderedTable.txt', append = TRUE,
quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE,
col.names = FALSE)
Here you included the argument col.names = FALSE
which is why your file doesn't have column names. You also don't need/want append=TRUE
. If you look at help(write.table)
, you see it is "only relevant if file is a character string". Here it seems to make the file write without ending the last line, which would likely cause any later read.table()
to complain.
这里你包含了参数col.names = FALSE,这就是你的文件没有列名的原因。你也不需要/想要append = TRUE。如果你看一下help(write.table),你会发现它“仅在文件是字符串时才相关”。这里似乎使文件写入而不结束最后一行,这可能会导致任何后来的read.table()抱怨。
The numbers 1-15 in your result look like row numbers. You don't explain how you look at the resulting file, so I cannot be sure. You likely read your file in a way that doesn't parse the row.names and is showing row numbers instead. If you make certain your SampleIDs column does not get assigned to be names of rows, you'll probably be fine.
结果中的数字1-15看起来像行号。您没有解释如何查看生成的文件,因此我无法确定。您可能以不解析row.names的方式读取文件,而是显示行号。如果您确定您的SampleIDs列未被指定为行的名称,那么您可能会没问题。
#2
5
Have a look at the arrange
function of the plyr
package.
看看plyr包的排列功能。
arrange(data, Cluster)
write.table(data, "ordered_data.txt")