解析包含的数据。从R中的一个文本文件

时间:2021-08-13 09:47:04

I was trying to parse all the rows with the missing data '.' as I need the missing data information. I used the following code and got the following error messages. I was able to make this code work with a simpler data file with 4 columns where the second and third columns were the ones with A,T,G,C and '.' in the cells. However, this same code is not working with this dataset. It could be due to difference in data structure as only 2 variables show up when i use the ncol function. A small snippet of how the output should look is given below the snippet of the data file.

我试图用缺失的数据来解析所有的行。因为我需要丢失的数据信息。我使用了以下代码并获得了以下错误消息。我可以使用一个更简单的数据文件,其中第二和第三列分别为a、T、G、C和'。的细胞。但是,同样的代码不能使用这个数据集。这可能是由于数据结构的差异,当我使用ncol函数时只有两个变量出现。在数据文件的片段下面给出了输出的一小段。

> test2.txt <- read.table("ref_qry.snps", header=F, sep="\t", skip=5)

> subset(test2.txt, rowSums(test2.txt[1:2] == ".") >0) 

[1] V1 V2
    <0 rows> (or 0-length row.names)
> subset(test2.txt, rowSums(test2.txt[,1] == ".") >0)
    Error in rowSums(test2.txt[, 1] == ".") : 
      'x' must be an array of at least two dimensions


1341   C T   9464894   |        8      373  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1360   C .   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1361   A .   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1362   G .   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1363   A T   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1402   . A   9464835   |        5      432  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1407   A G   9464830   |        4      437  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1


1360   C .   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1361   A .   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1362   G .   9464875   |        1      392  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1
1402   . A   9464835   |        5      432  |  9798893  9465266  |  1 -1  Supercontig_12.1  scaffold_1

1 个解决方案

#1


0  

Does this achieve what you want:

这是否实现了你想要的:

# set any type and any amount of "white space" as separator
x <- read.table("ref_qry.snps", header=F, sep="", skip=5)
#                                         ^^^^^^
# now we have distinct columns that we can check for "."
x[ x[2] == "." | x[3] == ".", ]

It works for me just copying and pasting your data into a text file.

我只需要将数据复制粘贴到文本文件中就可以了。

#1


0  

Does this achieve what you want:

这是否实现了你想要的:

# set any type and any amount of "white space" as separator
x <- read.table("ref_qry.snps", header=F, sep="", skip=5)
#                                         ^^^^^^
# now we have distinct columns that we can check for "."
x[ x[2] == "." | x[3] == ".", ]

It works for me just copying and pasting your data into a text file.

我只需要将数据复制粘贴到文本文件中就可以了。