I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:
我创建了一个随机森林,并预测了我的测试集的类,它们在dataframe中快乐地生活着:
row.names class 564028 1 275747 1 601137 0 922930 1 481988 1 ...
The row.names
attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.
name属性告诉我哪一行是哪个行,在我做各种操作之前,在这个过程中打乱了行的顺序。目前为止一切都很顺利。
Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names
attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.
现在我想让大家对我的预测的准确性有个大致的了解。要做到这一点,我需要使用这个dataframe并根据row.name属性重新排序它。通过这种方式,我可以将观察结果与我已经知道的标签进行比较。
Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.
请原谅我问这样一个基本的问题,但是对于我的生活,我找不到一个好的信息来源来处理这样一个微不足道的任务。
The documentation implores me to:
文件要求我:
use
attr(x, "row.names")
if you need to retrieve an integer-valued set of row names.使用attr(x,“row.names”),如果您需要检索一个整数值的行名称集。
but this leaves me with nothing but NULL
.
但这让我一无所有。
My question is, how can I use row.names
which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?
我的问题是,在我的工作流程中,在dataframes的各种化身中一直忠实地跟踪我的名字,我该如何使用?这不是它的目的吗?
7 个解决方案
#1
12
This worked for me:
这工作对我来说:
new_df <- df[ order(row.names(df)), ]
#2
18
None of the solutions would actually work. It should be:
所有的解决方案都不会起作用。应该是:
df[ order(as.numeric(row.names(df))),]
#assuming the data frame is called df
假设数据帧被称为df (df)。
because rowname in R is 'character', when the as.numeric part is missing it arrange the data as 1, 10, 11 ... so on.
因为R中的rowname是“字符”,当as。数字部分丢失,它将数据排列为1、10、11……等等。
#3
2
For completeness:
完整性:
@BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in @BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":
@BondedDust的答案对于rownames属性非常有效,但是您的示例不使用rowname属性。在您的问题中提供的输出指示使用名为“row.names”的列,这不是同一件事(所有在@BondedDust的注释中列出)。如果您想要在您的问题中给出的“row.names”列(这里还有另一个帖子在这里),那么您将会得到答案。这个答案假设您使用的是名为“df”的dataframe,其中一个列名为“row.names”:
ordered.df <- df[order(df$row.names),] #this orders the df by the "row.names" column
Alternatively, to order by the first column (same thing if you're still using your example):
或者,按照第一列的顺序(如果你还在使用你的例子):
ordered.df <- df[order(df[,1]),] #this orders the df by the first column
Hope this is helpful!
希望这是有帮助的!
#4
1
This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():
这几乎是自动完成的,因为“[”函数将以任何可以匹配rownames()的向量的词法顺序显示:
df[ rownames(df) , ]
You might have thought it would be necessary to use:
你可能认为有必要使用:
df[ order(rownames(df)) , ]
But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.
但那将会给你1点100分,10,100,12,13,…2,20日,21日,…,因为“[”被强制转换为字符。
#5
0
Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:
假设您的数据帧被命名为“df”,您可以创建一个新的有序数据帧“ord.df”,它将包含df的行名称以及它在以下一行代码中的值:
>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])
#6
0
new_df <- df[ order(row.names(df)), ]
or something similar won't work. After this statement, the new_df
does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname
或者类似的方法行不通。在此语句之后,new_df不再具有行名。我想一个更好的解决方案是将一个列添加为rowname,并将它排序为rowname。
#7
0
If you have only one column in your dataframe like in my case you have to add drop=F:
如果您的dataframe中只有一个列,比如在我的情况下,您必须添加drop=F:
df[ order(rownames(df)) , ,drop=F]
#1
12
This worked for me:
这工作对我来说:
new_df <- df[ order(row.names(df)), ]
#2
18
None of the solutions would actually work. It should be:
所有的解决方案都不会起作用。应该是:
df[ order(as.numeric(row.names(df))),]
#assuming the data frame is called df
假设数据帧被称为df (df)。
because rowname in R is 'character', when the as.numeric part is missing it arrange the data as 1, 10, 11 ... so on.
因为R中的rowname是“字符”,当as。数字部分丢失,它将数据排列为1、10、11……等等。
#3
2
For completeness:
完整性:
@BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in @BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":
@BondedDust的答案对于rownames属性非常有效,但是您的示例不使用rowname属性。在您的问题中提供的输出指示使用名为“row.names”的列,这不是同一件事(所有在@BondedDust的注释中列出)。如果您想要在您的问题中给出的“row.names”列(这里还有另一个帖子在这里),那么您将会得到答案。这个答案假设您使用的是名为“df”的dataframe,其中一个列名为“row.names”:
ordered.df <- df[order(df$row.names),] #this orders the df by the "row.names" column
Alternatively, to order by the first column (same thing if you're still using your example):
或者,按照第一列的顺序(如果你还在使用你的例子):
ordered.df <- df[order(df[,1]),] #this orders the df by the first column
Hope this is helpful!
希望这是有帮助的!
#4
1
This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():
这几乎是自动完成的,因为“[”函数将以任何可以匹配rownames()的向量的词法顺序显示:
df[ rownames(df) , ]
You might have thought it would be necessary to use:
你可能认为有必要使用:
df[ order(rownames(df)) , ]
But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.
但那将会给你1点100分,10,100,12,13,…2,20日,21日,…,因为“[”被强制转换为字符。
#5
0
Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:
假设您的数据帧被命名为“df”,您可以创建一个新的有序数据帧“ord.df”,它将包含df的行名称以及它在以下一行代码中的值:
>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])
#6
0
new_df <- df[ order(row.names(df)), ]
or something similar won't work. After this statement, the new_df
does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname
或者类似的方法行不通。在此语句之后,new_df不再具有行名。我想一个更好的解决方案是将一个列添加为rowname,并将它排序为rowname。
#7
0
If you have only one column in your dataframe like in my case you have to add drop=F:
如果您的dataframe中只有一个列,比如在我的情况下,您必须添加drop=F:
df[ order(rownames(df)) , ,drop=F]