R:除空白之外的列中的条目数

时间:2023-02-04 00:15:50

My data looks like this:

我的数据如下所示:

CHROM           Mutant_SNP_2
3RD                 T
4RD                 C
5RD                 
6RD                 G
7RD                 A
8RD                  

I have a CSV dataframe. I want a count from column "Mutant_SNP_2" of how many rows have an entry and therefore don't want a count of any blanks " ". I am separating it out by column "CHROM". I am getting the right output in terms of layout using this code in dplyr: count(combined, Mutant_SNP_2, wt = CHROM, sort = FALSE) however it is only counting the blank rows rather than those with a value. Any idea much appreciated. The output I get:

我有一个CSV数据帧。我想从列“Mutant_SNP_2”中计算有多少行有一个条目,因此不希望计算任何空白“”。我将其列为“CHROM”列。我在dplyr中使用此代码得到了正确的布局输出:count(合并,Mutant_SNP_2,wt = CHROM,sort = FALSE)但是它只计算空白行而不是具有值的行。任何想法都非常感激。我得到的输出:

 Mutant_SNP_2                         CHROM.x     n
         (fctr)                          (fctr) (int)
1               gi|339957448|gb|AENI01001139.1|    23
2               gi|339957449|gb|AENI01001138.1|     9
3               gi|339957451|gb|AENI01001136.1|    97
4               gi|339957452|gb|AENI01001135.1|   116
5               gi|339957453|gb|AENI01001134.1|   175
6               gi|339957454|gb|AENI01001133.1|     2
7               gi|339957455|gb|AENI01001132.1|    78
8               gi|339957456|gb|AENI01001131.1|    51
9               gi|339957457|gb|AENI01001130.1|     2
10              gi|339957458|gb|AENI01001129.1|    52
..          ...                             ...   ...

4 个解决方案

#1


3  

You can try with function table, the line with TRUE will give you the number of not blank value, by CHROM value) :

您可以尝试使用功能表,使用TRUE的行将为您提供非空白值的数量,按CHROM值):

table(df$Mutant_SNP_2!="", df$CHROM)

You can get the result directly with table(df$Mutant_SNP_2!="", df$CHROM)[2, ]

您可以直接用表格获得结果(df $ Mutant_SNP_2!=“”,df $ CHROM)[2,]

Example:

例:

set.seed(123)
df <- data.frame(CHROM=sample(letters[1:3], 10, replace=TRUE), Mutant_SNP_2=sample(c("", "not blank"), 10, replace=TRUE), stringsAsFactors=FALSE)

table(df$Mutant_SNP_2!="", df$CHROM)
#        a b c
#  FALSE 0 2 3
#  TRUE  2 2 1

table(df$Mutant_SNP_2!="", df$CHROM)[2, ]
# a b c 
# 2 2 1

#2


1  

We could try summing the boolean vector df$Mutant_SNP_2 != "" grouped by CHROM. This works because TRUE's will be coerced to 1, while FALSE's to 0.

我们可以尝试将由CHROM分组的布尔向量df $ Mutant_SNP_2!=“”求和。这是有效的,因为TRUE将被强制为1,而FALSE将被强制为0。

library(dplyr)
df %>% group_by(CHROM) %>%
  summarise(n = sum(Mutant_SNP_2 != "")) 

   CHROM     n
  (fctr) (int)
1    3RD     1
2    4RD     1
3    5RD     0
4    6RD     1
5    7RD     1
6    8RD     0

#3


1  

Try this:

尝试这个:

library(data.table)

setDT(df)[ Mutant_SNP_2 != "", .(count = .N), by=CHROM]

Perhaps this?

也许这个?

setDT(df)[ ,.(count= length(unique(Mutant_SNP_2))),  by=CHROM]

#4


0  

We can ave from base R to do this

我们可以从基地R做到这一点

with(df1, as.numeric(ave(Mutant_SNP_2, CHROM, 
               FUN= function(x)  sum(nzchar(x)))))
#[1] 1 1 0 1 1 0

#1


3  

You can try with function table, the line with TRUE will give you the number of not blank value, by CHROM value) :

您可以尝试使用功能表,使用TRUE的行将为您提供非空白值的数量,按CHROM值):

table(df$Mutant_SNP_2!="", df$CHROM)

You can get the result directly with table(df$Mutant_SNP_2!="", df$CHROM)[2, ]

您可以直接用表格获得结果(df $ Mutant_SNP_2!=“”,df $ CHROM)[2,]

Example:

例:

set.seed(123)
df <- data.frame(CHROM=sample(letters[1:3], 10, replace=TRUE), Mutant_SNP_2=sample(c("", "not blank"), 10, replace=TRUE), stringsAsFactors=FALSE)

table(df$Mutant_SNP_2!="", df$CHROM)
#        a b c
#  FALSE 0 2 3
#  TRUE  2 2 1

table(df$Mutant_SNP_2!="", df$CHROM)[2, ]
# a b c 
# 2 2 1

#2


1  

We could try summing the boolean vector df$Mutant_SNP_2 != "" grouped by CHROM. This works because TRUE's will be coerced to 1, while FALSE's to 0.

我们可以尝试将由CHROM分组的布尔向量df $ Mutant_SNP_2!=“”求和。这是有效的,因为TRUE将被强制为1,而FALSE将被强制为0。

library(dplyr)
df %>% group_by(CHROM) %>%
  summarise(n = sum(Mutant_SNP_2 != "")) 

   CHROM     n
  (fctr) (int)
1    3RD     1
2    4RD     1
3    5RD     0
4    6RD     1
5    7RD     1
6    8RD     0

#3


1  

Try this:

尝试这个:

library(data.table)

setDT(df)[ Mutant_SNP_2 != "", .(count = .N), by=CHROM]

Perhaps this?

也许这个?

setDT(df)[ ,.(count= length(unique(Mutant_SNP_2))),  by=CHROM]

#4


0  

We can ave from base R to do this

我们可以从基地R做到这一点

with(df1, as.numeric(ave(Mutant_SNP_2, CHROM, 
               FUN= function(x)  sum(nzchar(x)))))
#[1] 1 1 0 1 1 0