How to count element in each list in the data frame with pandas?

时间:2022-06-18 15:51:28

Given such a data frame df:

给定这样的数据帧df:

0     1
1     [12]
1     [13]
2     [11,12]
1     [10,0,1]
....

I'd like to count a certain value, for instance, '12' in each list of df. So i tried:

我想计算一个特定的值,例如每个df列表中的'12'。所以我试过:

df.apply(list.count('12'))

but got error: TypeError: descriptor 'count' requires a 'list' object but received a 'str'. But they are exactly lists in df[1]! How can I correct it? Thanks!

但得到错误:TypeError:描述符'count'需要'list'对象但收到'str'。但它们恰好是df [1]中的列表!我怎样才能纠正它?谢谢!

3 个解决方案

#1


1  

I think you can try first select column as Series by ix and then apply function x.count(12):

我想你可以尝试首先选择列作为ix系列,然后应用函数x.count(12):

import pandas as pd

d = { 0:pd.Series([1,1,2,1]),
      1:pd.Series([[12], [13], [11,12 ],[10,0,1]])}

df = pd.DataFrame(d)  

print df 
   0           1
0  1        [12]
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1]
0          [12]
1          [13]
2      [11, 12]
3    [10, 0, 1]
Name: 1, dtype: object

print df.ix[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

Or use iloc for selecting:

或使用iloc选择:

print df.iloc[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT:

I think column 1 contains NaN.

我认为第1列包含NaN。

You can use:

您可以使用:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1].notnull()
0    False
1     True
2     True
3     True
Name: 1, dtype: bool

print df.ix[df.ix[:, 1].notnull(), 1].apply(lambda x: x.count(12))   
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT2:

If you want filter by index (e.g. 0:2) and by NaN in column 1:

如果您希望按索引(例如0:2)和第1列中的NaN过滤:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

#filter df by index - only 0 to 2 
print df.ix[0:2, 1]
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

#boolean series, where is not nul filtered df
print df.ix[0:2, 1].notnull()
0    False
1     True
2     True
Name: 1, dtype: bool

#get column 1: first is filtered to 0:2 index and then if is not null
print df.ix[0:2, 1][df.ix[0:2, 1].notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object
#same as above, but more nice
df1 =  df.ix[0:2, 1]
print df1
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

print df1[df1.notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#apply count
print df1[df1.notnull()].apply(lambda x: x.count(12))   
1    0
2    1
Name: 1, dtype: int64

#2


1  

The count has to be applied on the column.

必须在列上应用计数。

# Test data
df = pd.DataFrame({1: [[1], [12], [13], [11,12], [10,0,1]]})

df[1].apply(lambda x: x.count(12))

0    0
1    1
2    0
3    1
4    0
Name: 1, dtype: int64

A modification to handle the case when some values are not stored in a list

当某些值未存储在列表中时处理该情况的修改

# An example with values not stored in list 
df = pd.DataFrame({1: [12, [12], [13], [11,12], [10,0,1], 1]})

_check = 12
df[1].apply(lambda l: l.count(_check) if (type(l) is list) else int(l == _check))

0    1
1    1
2    0
3    1
4    0
5    0
Name: 1, dtype: int64

#3


0  

You can use a conditional generator expression:

您可以使用条件生成器表达式:

df = df = pd.DataFrame({0: [1, 1, 2, 1, 1, 2], 1: [np.nan, [13], [11, 12], [10, 0, 1], [12], [np.nan, 12]]})

target = 12
>>> sum(sub_list.count(target) 
        for sub_list in df.iloc[:, 1] 
        if not np.isnan(sub_list).all())
3

This is like the following conditional list comprehension:

这类似于以下条件列表理解:

>>> [sub_list.count(12) for sub_list in df.iloc[:, 1] if not np.isnan(sub_list).all()]
[0, 1, 0, 1, 1]

The difference is that the former lazily evaluates each item in the list instead of first generating the entire list, so it is generally more efficient.

区别在于前者懒惰地评估列表中的每个项目而不是首先生成整个列表,因此通常更有效。

#1


1  

I think you can try first select column as Series by ix and then apply function x.count(12):

我想你可以尝试首先选择列作为ix系列,然后应用函数x.count(12):

import pandas as pd

d = { 0:pd.Series([1,1,2,1]),
      1:pd.Series([[12], [13], [11,12 ],[10,0,1]])}

df = pd.DataFrame(d)  

print df 
   0           1
0  1        [12]
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1]
0          [12]
1          [13]
2      [11, 12]
3    [10, 0, 1]
Name: 1, dtype: object

print df.ix[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

Or use iloc for selecting:

或使用iloc选择:

print df.iloc[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT:

I think column 1 contains NaN.

我认为第1列包含NaN。

You can use:

您可以使用:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1].notnull()
0    False
1     True
2     True
3     True
Name: 1, dtype: bool

print df.ix[df.ix[:, 1].notnull(), 1].apply(lambda x: x.count(12))   
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT2:

If you want filter by index (e.g. 0:2) and by NaN in column 1:

如果您希望按索引(例如0:2)和第1列中的NaN过滤:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

#filter df by index - only 0 to 2 
print df.ix[0:2, 1]
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

#boolean series, where is not nul filtered df
print df.ix[0:2, 1].notnull()
0    False
1     True
2     True
Name: 1, dtype: bool

#get column 1: first is filtered to 0:2 index and then if is not null
print df.ix[0:2, 1][df.ix[0:2, 1].notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object
#same as above, but more nice
df1 =  df.ix[0:2, 1]
print df1
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

print df1[df1.notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#apply count
print df1[df1.notnull()].apply(lambda x: x.count(12))   
1    0
2    1
Name: 1, dtype: int64

#2


1  

The count has to be applied on the column.

必须在列上应用计数。

# Test data
df = pd.DataFrame({1: [[1], [12], [13], [11,12], [10,0,1]]})

df[1].apply(lambda x: x.count(12))

0    0
1    1
2    0
3    1
4    0
Name: 1, dtype: int64

A modification to handle the case when some values are not stored in a list

当某些值未存储在列表中时处理该情况的修改

# An example with values not stored in list 
df = pd.DataFrame({1: [12, [12], [13], [11,12], [10,0,1], 1]})

_check = 12
df[1].apply(lambda l: l.count(_check) if (type(l) is list) else int(l == _check))

0    1
1    1
2    0
3    1
4    0
5    0
Name: 1, dtype: int64

#3


0  

You can use a conditional generator expression:

您可以使用条件生成器表达式:

df = df = pd.DataFrame({0: [1, 1, 2, 1, 1, 2], 1: [np.nan, [13], [11, 12], [10, 0, 1], [12], [np.nan, 12]]})

target = 12
>>> sum(sub_list.count(target) 
        for sub_list in df.iloc[:, 1] 
        if not np.isnan(sub_list).all())
3

This is like the following conditional list comprehension:

这类似于以下条件列表理解:

>>> [sub_list.count(12) for sub_list in df.iloc[:, 1] if not np.isnan(sub_list).all()]
[0, 1, 0, 1, 1]

The difference is that the former lazily evaluates each item in the list instead of first generating the entire list, so it is generally more efficient.

区别在于前者懒惰地评估列表中的每个项目而不是首先生成整个列表,因此通常更有效。