从Graphlab SFrame的特定列中查找具有“Not Applicable”值的行

时间:2021-06-07 13:02:16

Given a Graphlab.SFrame object with the following column names:

给定具有以下列名的Graphlab.SFrame对象:

>>> import graphlab
>>> sf = graphlab.SFrame.read_csv('some.csv')
>>> s.column_names()
['Dataset', 'Domain', 'Score', 'Sent1', 'Sent2']

One could easily drop the rows with "not applicable" (NA) / None value in a particular column, e.g. to drop rows with NA values for the "Score" column, I could do this:

可以容易地在特定列中删除具有“不适用”(NA)/无值的行,例如,要删除“得分”列的NA值行,我可以这样做:

>>> sf.dropna('Score')

Or to replace the None value with a certain value (let's say -1), I could do this:

或者用一定值替换None值(假设为-1),我可以这样做:

>>> sf.fillna('Score', -1)

After checking the SFrame docs from https://dato.com/products/create/docs/generated/graphlab.SFrame.html, there isn't a built-in function to find the rows that contains None for a certain column, something like sf.findna('Score'). Or possibly I might have missed it.

检查来自https://dato.com/products/create/docs/generated/graphlab.SFrame.html的SFrame文档后,没有内置函数来查找某些列包含None的行,喜欢sf.findna('得分')。或者我可能错过了它。

If there is such a function, what is it called?

如果有这样的功能,它叫什么?

If there isn't how should I extract the rows where there's a specified column in that row with NA values?

如果没有,我应该如何提取行中具有NA值的指定列的行?

1 个解决方案

#1


2  

I think you can use a boolean array to identify the rows with missing values for a given column.

我认为您可以使用布尔数组来标识给定列的缺失值的行。

>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
...                       'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]

#1


2  

I think you can use a boolean array to identify the rows with missing values for a given column.

我认为您可以使用布尔数组来标识给定列的缺失值的行。

>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
...                       'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]