Given a Graphlab.SFrame
object with the following column names:
给定具有以下列名的Graphlab.SFrame对象:
>>> import graphlab
>>> sf = graphlab.SFrame.read_csv('some.csv')
>>> s.column_names()
['Dataset', 'Domain', 'Score', 'Sent1', 'Sent2']
One could easily drop the rows with "not applicable" (NA) / None value in a particular column, e.g. to drop rows with NA values for the "Score" column, I could do this:
可以容易地在特定列中删除具有“不适用”(NA)/无值的行,例如,要删除“得分”列的NA值行,我可以这样做:
>>> sf.dropna('Score')
Or to replace the None value with a certain value (let's say -1), I could do this:
或者用一定值替换None值(假设为-1),我可以这样做:
>>> sf.fillna('Score', -1)
After checking the SFrame docs from https://dato.com/products/create/docs/generated/graphlab.SFrame.html, there isn't a built-in function to find the rows that contains None for a certain column, something like sf.findna('Score')
. Or possibly I might have missed it.
检查来自https://dato.com/products/create/docs/generated/graphlab.SFrame.html的SFrame文档后,没有内置函数来查找某些列包含None的行,喜欢sf.findna('得分')。或者我可能错过了它。
If there is such a function, what is it called?
如果有这样的功能,它叫什么?
If there isn't how should I extract the rows where there's a specified column in that row with NA values?
如果没有,我应该如何提取行中具有NA值的指定列的行?
1 个解决方案
#1
2
I think you can use a boolean array to identify the rows with missing values for a given column.
我认为您可以使用布尔数组来标识给定列的缺失值的行。
>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
... 'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]
#1
2
I think you can use a boolean array to identify the rows with missing values for a given column.
我认为您可以使用布尔数组来标识给定列的缺失值的行。
>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
... 'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]