I want to count the number of NA
values in a data frame column. Say my data frame is called df
, and the name of the column I am considering is col
. The way I have come up with is following:
我要计算数据帧列中NA值的数量。假设我的数据框架被称为df,我考虑的列的名称是col,我想到的方法如下:
sapply(df$col, function(x) sum(length(which(is.na(x)))))
Is this a good/most efficient way to do this?
这是一个很好的/最有效的方法吗?
11 个解决方案
#1
213
You're over-thinking the problem:
你主演的问题:
sum(is.na(df$col))
#2
58
If you are looking for NA
counts for each column in a dataframe then:
如果您正在为dataframe中的每一列查找NA计数,则:
na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))
should give you a list with the counts for each column.
应该给你一个列出每个列的计数的列表。
na_count <- data.frame(na_count)
Should output the data nicely in a dataframe like:
应该将数据很好地输出到dataframe中,例如:
----------------------
| row.names | na_count
------------------------
| column_1 | count
#3
13
If you are looking to count the number of NAs in the entire dataframe you could also use
如果要计算整个数据aframe中的NAs的数量,也可以使用
sum(is.na(df))
#4
13
Try the colSums
function
尝试colSums函数
df <- data.frame(x = c(1,2,NA), y = rep(NA, 3))
colSums(is.na(df))
#x y
#1 3
#5
11
In the summary()
output, the function also counts the NA
s so one can use this function if one wants the sum of NA
s in several variables.
在summary()输出中,该函数还对NAs进行计数,因此如果希望在多个变量中包含NAs的总和,可以使用该函数。
#6
6
This form, slightly changed from Kevin Ogoros's one:
这个表格和Kevin Ogoros的略有不同:
na_count <-function (x) sapply(x, function(y) sum(is.na(y)))
returns NA counts as named int array
返回命名为int数组的NA计数
#7
6
A tidyverse way to count the number of nulls in every column of a dataframe:
一种计算dataframe每个列中空值数的tidyverse方法:
library(tidyverse)
library(purrr)
df %>%
map_df(function(x) sum(is.na(x))) %>%
gather(feature, num_nulls) %>%
print(n = 100)
#8
3
User rrs answer is right but that only tells you the number of NA values in the particular column of the data frame that you are passing to get the number of NA values for the whole data frame try this:
用户rrs回答是正确的,但这只告诉您要传递给数据帧的特定列中的NA值的数量,以获得整个数据帧的NA值的数量
apply(<name of dataFrame>, 2<for getting column stats>, function(x) {sum(is.na(x))})
This does the trick
这就可以了
#9
2
Try this:
试试这个:
length(df$col[is.na(df$col)])
#10
2
I read a csv file from local directory. Following code works for me.
我从本地目录读取csv文件。下面的代码适合我。
# to get number of which contains na
sum(is.na(df[, c(columnName)]) # to get number of na row
# to get number of which not contains na
sum(!is.na(df[, c(columnName)])
#here columnName is your desire column name
#11
0
You can use this to count number of NA or blanks in every column
您可以使用它来计算每个列中的NA或空格的数量
colSums(is.na(data_set_name)|data_set_name == '')
#1
213
You're over-thinking the problem:
你主演的问题:
sum(is.na(df$col))
#2
58
If you are looking for NA
counts for each column in a dataframe then:
如果您正在为dataframe中的每一列查找NA计数,则:
na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))
should give you a list with the counts for each column.
应该给你一个列出每个列的计数的列表。
na_count <- data.frame(na_count)
Should output the data nicely in a dataframe like:
应该将数据很好地输出到dataframe中,例如:
----------------------
| row.names | na_count
------------------------
| column_1 | count
#3
13
If you are looking to count the number of NAs in the entire dataframe you could also use
如果要计算整个数据aframe中的NAs的数量,也可以使用
sum(is.na(df))
#4
13
Try the colSums
function
尝试colSums函数
df <- data.frame(x = c(1,2,NA), y = rep(NA, 3))
colSums(is.na(df))
#x y
#1 3
#5
11
In the summary()
output, the function also counts the NA
s so one can use this function if one wants the sum of NA
s in several variables.
在summary()输出中,该函数还对NAs进行计数,因此如果希望在多个变量中包含NAs的总和,可以使用该函数。
#6
6
This form, slightly changed from Kevin Ogoros's one:
这个表格和Kevin Ogoros的略有不同:
na_count <-function (x) sapply(x, function(y) sum(is.na(y)))
returns NA counts as named int array
返回命名为int数组的NA计数
#7
6
A tidyverse way to count the number of nulls in every column of a dataframe:
一种计算dataframe每个列中空值数的tidyverse方法:
library(tidyverse)
library(purrr)
df %>%
map_df(function(x) sum(is.na(x))) %>%
gather(feature, num_nulls) %>%
print(n = 100)
#8
3
User rrs answer is right but that only tells you the number of NA values in the particular column of the data frame that you are passing to get the number of NA values for the whole data frame try this:
用户rrs回答是正确的,但这只告诉您要传递给数据帧的特定列中的NA值的数量,以获得整个数据帧的NA值的数量
apply(<name of dataFrame>, 2<for getting column stats>, function(x) {sum(is.na(x))})
This does the trick
这就可以了
#9
2
Try this:
试试这个:
length(df$col[is.na(df$col)])
#10
2
I read a csv file from local directory. Following code works for me.
我从本地目录读取csv文件。下面的代码适合我。
# to get number of which contains na
sum(is.na(df[, c(columnName)]) # to get number of na row
# to get number of which not contains na
sum(!is.na(df[, c(columnName)])
#here columnName is your desire column name
#11
0
You can use this to count number of NA or blanks in every column
您可以使用它来计算每个列中的NA或空格的数量
colSums(is.na(data_set_name)|data_set_name == '')