here my problem in R:
这里我的问题是R:
mtable <- read.table(paste(".folder_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE)
The first folder part or paste() is normally wrapped by a var, for debugging purpose -> static.
第一个文件夹部分或粘贴()通常被一个var包装,用于调试目的->静态。
I always get the message:
我总是得到这样的信息:
Error in read.table(paste(".frunc_1362704682.4574", "/groups.txt", sep = ""), :
duplicate 'row.names' are not allowed
But if i look to the file with this header:
但是如果我用这个标题看文件:
root_node_name node_name node_id #genes_in_root_node #genes_in_node #genes_with_variable=1_in_root_node #genes_with_variable=1_in_node raw_p_underrepresentation_of_variable=1 raw_p_overrepresentation_ of_variable=1 FWER_underrepresentation FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation
I can not see any duplicates.. :( I've read in another discussion about that i should try :
我看不出有什么重复。(我在另一个讨论中读到过,我应该试试:
mtable <- read.table(paste(".frunc_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE,**row.names=NULL**)
That works nice, but after that all headings are shifted one column to the right:
这很好,但在那之后所有的标题都移向右栏:
> head(mtable, n=1)
row.names root_node_name node_name
1 molecular_function trans-hexaprenyltranstransferase activity GO:0000010
node_id #genes_in_root_node #genes_in_node
1 17668 2 2419
#genes_with_variable=1_in_root_node #genes_with_variable=1_in_node
1 0 0.74491
raw_p_underrepresentation_of_variable=1
1 1
raw_p_overrepresentation_of_variable=1 FWER_underrepresentation
1 1 1
FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation
1
Any ideas to get it right? :(
有什么想法吗?:(
EDIT:
编辑:
Okay as a comenteer said, this is mainly a problem with thr rows.. stupid as iam i thought it ight come from the header. but i dont wanna name the rows, it just should read them easy in... o.O cant be that hard , or?
好吧,就像comenteer说的,这主要是thr行的问题。我真笨,我以为它是来自头的。但是我不想给这些行命名,它应该在…o。啊,这不是那么难吗?
File-content:
文件内容:
molecular_function trans-hexaprenyltranstransferase activity GO:0000010 17668 2 2419 0 0.74491 1 1 1 -1 -1
molecular_function single-stranded DNA specific endodeoxyribonuclease activity GO:0000014 17668 5 2419 0 0.478885 1 1 1 -1 -1
molecular_function lactase activity GO:0000016 17668 1 2419 0 0.863086 1 1 1 -1 -1
molecular_function alpha-1,3-mannosyltransferase activity GO:0000033 17668 3 2419 0 0.64291 1 1 1 -1 -1
molecular_function tRNA binding GO:0000049 17668 27 2419 7 0.975698 0.0663832 1 1 -1 -1
molecular_function fatty-acyl-CoA binding GO:0000062 17668 20 2419 6 0.986407 0.0460431 1 1 -1 -1
molecular_function L-ornithine transmembrane transporter activity GO:0000064 17668 1 2419 0 0.863086 1 1 1 -1 -1
molecular_function S-adenosylmethionine transmembrane transporter activity GO:0000095 17668 1 2419 0 0.863086 1 1 1 -1 -1
4 个解决方案
#1
9
According to the R documentation here,
根据R文档,
If there is a header and the first row contains one fewer field
than the number of columns, the first column in the input is used
for the row names. Otherwise if row.names is missing, the rows are numbered.
... therefore I'd suggest that the first row may have one fewer field than the number of columns, so read.table()
is selecting the first column (which contains more than one copy of molecular_function
) as the row names.
…因此,我建议第一行可能比列数少一个字段,所以read.table()选择第一个列(包含多个分子函数的副本)作为行名称。
#2
1
The answer here (https://*.com/a/22408965/2236315) by @adrianoesch should help.
@adrianoesch的答案(https://*.com/a/22408965/2236315)应该会有帮助。
Note that if you open in some text editor, you should see that the number of header fields less than number of columns below the header row. In my case, the data set had a "," missing at the end of the last header field.
注意,如果您在某个文本编辑器中打开,您应该看到头字段的数量小于标题行下面的列数。在我的例子中,数据集有一个“,”在最后一个头字段的末尾缺失。
#3
0
I ran into the same problem and the issue was a tonne of tabular white-space at the bottom of my text file. Thus every row name was the same on these lines (ie was blank). Thus occurred because I converted from excel.
我遇到了同样的问题,问题是在我的文本文件的底部有一吨的表格空白。因此,在这些行中,每个行名称都是相同的(即空白)。这样发生是因为我从excel转换。
#4
0
I have automatically generated data files that wind up with one column empty other than the header. I don't want to have to edit each file separately (and risk fouling it up). Best work-around I found was in question #4066607, to include "row.names=NULL" in the arguments.
我有自动生成的数据文件,其中只有一个列是空的,而不是header。我不想单独编辑每个文件(并且有可能把它弄脏)。我找到的最好的工作是在问题#4066607中,在参数中包含“row.name =NULL”。
DF<-read.csv(file, ..... , row.names=NULL)
This isn't perfect, but lets me load the file. Unlike the behavior described in the other answer (forcing addition of a extra column of row numbers), I get the original first column labeled "row.names" and all the headers shifted one column to the right.... but it lets me get all the data in.
这不是完美的,但是让我加载文件。不像其他答案中描述的行为(强制添加额外的列的行数),我原来的第一列标记为“row.names”和所有的头向右移一列....但是它让我得到所有的数据。
#1
9
According to the R documentation here,
根据R文档,
If there is a header and the first row contains one fewer field
than the number of columns, the first column in the input is used
for the row names. Otherwise if row.names is missing, the rows are numbered.
... therefore I'd suggest that the first row may have one fewer field than the number of columns, so read.table()
is selecting the first column (which contains more than one copy of molecular_function
) as the row names.
…因此,我建议第一行可能比列数少一个字段,所以read.table()选择第一个列(包含多个分子函数的副本)作为行名称。
#2
1
The answer here (https://*.com/a/22408965/2236315) by @adrianoesch should help.
@adrianoesch的答案(https://*.com/a/22408965/2236315)应该会有帮助。
Note that if you open in some text editor, you should see that the number of header fields less than number of columns below the header row. In my case, the data set had a "," missing at the end of the last header field.
注意,如果您在某个文本编辑器中打开,您应该看到头字段的数量小于标题行下面的列数。在我的例子中,数据集有一个“,”在最后一个头字段的末尾缺失。
#3
0
I ran into the same problem and the issue was a tonne of tabular white-space at the bottom of my text file. Thus every row name was the same on these lines (ie was blank). Thus occurred because I converted from excel.
我遇到了同样的问题,问题是在我的文本文件的底部有一吨的表格空白。因此,在这些行中,每个行名称都是相同的(即空白)。这样发生是因为我从excel转换。
#4
0
I have automatically generated data files that wind up with one column empty other than the header. I don't want to have to edit each file separately (and risk fouling it up). Best work-around I found was in question #4066607, to include "row.names=NULL" in the arguments.
我有自动生成的数据文件,其中只有一个列是空的,而不是header。我不想单独编辑每个文件(并且有可能把它弄脏)。我找到的最好的工作是在问题#4066607中,在参数中包含“row.name =NULL”。
DF<-read.csv(file, ..... , row.names=NULL)
This isn't perfect, but lets me load the file. Unlike the behavior described in the other answer (forcing addition of a extra column of row numbers), I get the original first column labeled "row.names" and all the headers shifted one column to the right.... but it lets me get all the data in.
这不是完美的,但是让我加载文件。不像其他答案中描述的行为(强制添加额外的列的行数),我原来的第一列标记为“row.names”和所有的头向右移一列....但是它让我得到所有的数据。