Pandas - 计算每个输入行的函数行数

时间:2022-09-05 09:11:37

I have a dataframe that needs a column added to it. That column needs to be a count of all the other rows in the table that meet a certain condition, that condition needs to take in input both from the "input" row and the "output" row.


For example, if it was a dataframe describing people, and I wanted to make a column that counted how many people were taller than the current row and lighter.


I'd want the height and weight of the row, as well as the height and weight of the other rows in a function, so I can do something like:


def example_function(height1, weight1, height2, weight2):
    if height1 > height2 and weight1 < weight2:
        return True
        return False

And it would just sum up all the True's and give that sum in the column.


Is something like this possible?


Thanks in advance for any ideas!


Edit: Sample input:


id   name    height   weight   country
0    Adam    70       180      USA
1    Bill    65       190      CANADA
2    Chris   71       150      GERMANY
3    Eric    72       210      USA
4    Fred    74       160      FRANCE
5    Gary    75       220      MEXICO
6    Henry   61       230      SPAIN

The result would need to be:


id   name    height   weight   country   new_column
0    Adam    70       180      USA       1
1    Bill    65       190      CANADA    1
2    Chris   71       150      GERMANY   3
3    Eric    72       210      USA       1
4    Fred    74       160      FRANCE    4
5    Gary    75       220      MEXICO    1
6    Henry   61       230      SPAIN     0

I believe it will need to be some sort of function, as the actual logic I need to use is more complicated.


edit 2:fixed typo


3 个解决方案



You can add booleans, like this:


count = ((df.height1 > df.height2) & (df.weight1 < df.weight2)).sum()


I test it a bit and then change conditions with custom function:


def f(x):
    #check boolean mask 
    #print ((df.height > x.height) & (df.weight < x.weight))
    return ((df.height < x.height) & (df.weight > x.weight)).sum()

df['new_column'] = df.apply(f, axis=1)
print (df)
   id   name  height  weight  country  new_column
0   0   Adam      70     180      USA           2
1   1   Bill      65     190   CANADA           1
2   2  Chris      71     150  GERMANY           3
3   3   Eric      72     210      USA           1
4   4   Fred      74     160   FRANCE           4
5   5   Gary      75     220   MEXICO           1
6   6  Henry      61     230    SPAIN           0


For each row compare values and for count simply sum values True.




For example, if it was a dataframe describing people, and I wanted to make a column that counted how many people were taller than the current row and lighter.


As far as I understand, you want to assign to a new column something like


df['num_heigher_and_leighter'] = df.apply(lambda r: ((df.height > r.height) & (df.weight < r.weight)).sum(), axis=1)

However, your text description doesn't seem to match the outcome, which is:


0    2
1    3
2    0
3    1
4    0
5    0
6    6
dtype: int64


As in any other case, you can use a named function instead of a lambda:


df = ...

def foo(r):
    return ((df.height > r.height) & (df.weight < r.weight)).sum()

df['num_heigher_and_leighter'] = df.apply(foo, axis=1)



I'm assuming you had a typo and want to compare heights with heights and weights with weights. If so, you could count the number of persons taller OR heavier like so:


>>> for i,height,weight in zip(df.index,df.height, df.weight):
...     cnt = df.loc[((df.height>height) & (df.weight>weight)), 'height'].count()
...     df.loc[i,'thing'] = cnt
>>> df
    name  height  weight  country  thing
0   Adam      70     180      USA    2.0
1   Bill      65     190   CANADA    2.0
2  Chris      71     150  GERMANY    3.0
3   Eric      72     210      USA    1.0
4   Fred      74     160   FRANCE    1.0
5   Gary      75     220   MEXICO    0.0
6  Henry      61     230    SPAIN    0.0

Here for instance, no person is Heavier than Henry, and no person is taller than Gary. If that's not what you intended, it should be easy to modify the & above to a | instead or switching out the > to a <.


When you're more accustomed to Pandas, I suggest you use Ami Tavory excellent answer instead.

当你更习惯于熊猫时,我建议你使用Ami Tavory的优秀答案。

PS. For the love of god, use the Metric system for representing weight and height, and convert to whatever for presentation. These numbers are totally nonsensical for the world population at large. :)

PS。对于上帝的爱,使用公制系统来表示重量和高度,并转换为任何呈现。这些数字对于世界人口来说完全没有意义。 :)



You can add booleans, like this:


count = ((df.height1 > df.height2) & (df.weight1 < df.weight2)).sum()


I test it a bit and then change conditions with custom function:


def f(x):
    #check boolean mask 
    #print ((df.height > x.height) & (df.weight < x.weight))
    return ((df.height < x.height) & (df.weight > x.weight)).sum()

df['new_column'] = df.apply(f, axis=1)
print (df)
   id   name  height  weight  country  new_column
0   0   Adam      70     180      USA           2
1   1   Bill      65     190   CANADA           1
2   2  Chris      71     150  GERMANY           3
3   3   Eric      72     210      USA           1
4   4   Fred      74     160   FRANCE           4
5   5   Gary      75     220   MEXICO           1
6   6  Henry      61     230    SPAIN           0


For each row compare values and for count simply sum values True.




For example, if it was a dataframe describing people, and I wanted to make a column that counted how many people were taller than the current row and lighter.


As far as I understand, you want to assign to a new column something like


df['num_heigher_and_leighter'] = df.apply(lambda r: ((df.height > r.height) & (df.weight < r.weight)).sum(), axis=1)

However, your text description doesn't seem to match the outcome, which is:


0    2
1    3
2    0
3    1
4    0
5    0
6    6
dtype: int64


As in any other case, you can use a named function instead of a lambda:


df = ...

def foo(r):
    return ((df.height > r.height) & (df.weight < r.weight)).sum()

df['num_heigher_and_leighter'] = df.apply(foo, axis=1)



I'm assuming you had a typo and want to compare heights with heights and weights with weights. If so, you could count the number of persons taller OR heavier like so:


>>> for i,height,weight in zip(df.index,df.height, df.weight):
...     cnt = df.loc[((df.height>height) & (df.weight>weight)), 'height'].count()
...     df.loc[i,'thing'] = cnt
>>> df
    name  height  weight  country  thing
0   Adam      70     180      USA    2.0
1   Bill      65     190   CANADA    2.0
2  Chris      71     150  GERMANY    3.0
3   Eric      72     210      USA    1.0
4   Fred      74     160   FRANCE    1.0
5   Gary      75     220   MEXICO    0.0
6  Henry      61     230    SPAIN    0.0

Here for instance, no person is Heavier than Henry, and no person is taller than Gary. If that's not what you intended, it should be easy to modify the & above to a | instead or switching out the > to a <.


When you're more accustomed to Pandas, I suggest you use Ami Tavory excellent answer instead.

当你更习惯于熊猫时,我建议你使用Ami Tavory的优秀答案。

PS. For the love of god, use the Metric system for representing weight and height, and convert to whatever for presentation. These numbers are totally nonsensical for the world population at large. :)

PS。对于上帝的爱,使用公制系统来表示重量和高度,并转换为任何呈现。这些数字对于世界人口来说完全没有意义。 :)