Pandas - 计算每个输入行的函数行数

时间:2022-09-05 09:11:37

I have a dataframe that needs a column added to it. That column needs to be a count of all the other rows in the table that meet a certain condition, that condition needs to take in input both from the "input" row and the "output" row.

我有一个数据框,需要添加一列。该列需要是表中满足特定条件的所有其他行的计数,该条件需要从“输入”行和“输出”行接收输入。

For example, if it was a dataframe describing people, and I wanted to make a column that counted how many people were taller than the current row and lighter.

例如,如果它是一个描述人的数据框,我想创建一个列,计算有多少人比当前行更轻,更轻。

I'd want the height and weight of the row, as well as the height and weight of the other rows in a function, so I can do something like:

我想要一行的高度和重量,以及函数中其他行的高度和重量,所以我可以这样做:

def example_function(height1, weight1, height2, weight2):
    if height1 > height2 and weight1 < weight2:
        return True
    else:
        return False

And it would just sum up all the True's and give that sum in the column.

它只是总结所有的真实并在列中给出这个总和。

Is something like this possible?

这样的事情可能吗?

Thanks in advance for any ideas!

提前感谢任何想法!

Edit: Sample input:

编辑:示例输入:

id   name    height   weight   country
0    Adam    70       180      USA
1    Bill    65       190      CANADA
2    Chris   71       150      GERMANY
3    Eric    72       210      USA
4    Fred    74       160      FRANCE
5    Gary    75       220      MEXICO
6    Henry   61       230      SPAIN

The result would need to be:

结果需要是:

id   name    height   weight   country   new_column
0    Adam    70       180      USA       1
1    Bill    65       190      CANADA    1
2    Chris   71       150      GERMANY   3
3    Eric    72       210      USA       1
4    Fred    74       160      FRANCE    4
5    Gary    75       220      MEXICO    1
6    Henry   61       230      SPAIN     0

I believe it will need to be some sort of function, as the actual logic I need to use is more complicated.

我认为它需要某种功能,因为我需要使用的实际逻辑更复杂。

edit 2:fixed typo

编辑2:修正错字

3 个解决方案

#1


2  

You can add booleans, like this:

您可以添加布尔值,如下所示:

count = ((df.height1 > df.height2) & (df.weight1 < df.weight2)).sum()

EDIT:

I test it a bit and then change conditions with custom function:

我测试了一下然后用自定义函数改变条件:

def f(x):
    #check boolean mask 
    #print ((df.height > x.height) & (df.weight < x.weight))
    return ((df.height < x.height) & (df.weight > x.weight)).sum()

df['new_column'] = df.apply(f, axis=1)
print (df)
   id   name  height  weight  country  new_column
0   0   Adam      70     180      USA           2
1   1   Bill      65     190   CANADA           1
2   2  Chris      71     150  GERMANY           3
3   3   Eric      72     210      USA           1
4   4   Fred      74     160   FRANCE           4
5   5   Gary      75     220   MEXICO           1
6   6  Henry      61     230    SPAIN           0

Explanation:

For each row compare values and for count simply sum values True.

对于每行比较值和计数,只需求和值True。

#2


1  

For example, if it was a dataframe describing people, and I wanted to make a column that counted how many people were taller than the current row and lighter.

例如,如果它是一个描述人的数据框,我想创建一个列,计算有多少人比当前行更轻,更轻。

As far as I understand, you want to assign to a new column something like

据我所知,你想要分配一个新的列

df['num_heigher_and_leighter'] = df.apply(lambda r: ((df.height > r.height) & (df.weight < r.weight)).sum(), axis=1)

However, your text description doesn't seem to match the outcome, which is:

但是,您的文字说明似乎与结果不符,即:

0    2
1    3
2    0
3    1
4    0
5    0
6    6
dtype: int64

Edit

As in any other case, you can use a named function instead of a lambda:

与任何其他情况一样,您可以使用命名函数而不是lambda:

df = ...

def foo(r):
    return ((df.height > r.height) & (df.weight < r.weight)).sum()

df['num_heigher_and_leighter'] = df.apply(foo, axis=1)

#3


0  

I'm assuming you had a typo and want to compare heights with heights and weights with weights. If so, you could count the number of persons taller OR heavier like so:

我假设你有一个错字,并希望将高度与权重和权重与权重进行比较。如果是这样,您可以计算更高或更重的人数,如下所示:

>>> for i,height,weight in zip(df.index,df.height, df.weight):
...     cnt = df.loc[((df.height>height) & (df.weight>weight)), 'height'].count()
...     df.loc[i,'thing'] = cnt
...
>>> df
    name  height  weight  country  thing
0   Adam      70     180      USA    2.0
1   Bill      65     190   CANADA    2.0
2  Chris      71     150  GERMANY    3.0
3   Eric      72     210      USA    1.0
4   Fred      74     160   FRANCE    1.0
5   Gary      75     220   MEXICO    0.0
6  Henry      61     230    SPAIN    0.0

Here for instance, no person is Heavier than Henry, and no person is taller than Gary. If that's not what you intended, it should be easy to modify the & above to a | instead or switching out the > to a <.

例如,没有人比亨利更重,没有人比加里更高。如果那不是您想要的,那么将&上面的&上面修改为|应该很容易相反或切换>到<。

When you're more accustomed to Pandas, I suggest you use Ami Tavory excellent answer instead.

当你更习惯于熊猫时,我建议你使用Ami Tavory的优秀答案。

PS. For the love of god, use the Metric system for representing weight and height, and convert to whatever for presentation. These numbers are totally nonsensical for the world population at large. :)

PS。对于上帝的爱,使用公制系统来表示重量和高度,并转换为任何呈现。这些数字对于世界人口来说完全没有意义。 :)

#1


2  

You can add booleans, like this:

您可以添加布尔值,如下所示:

count = ((df.height1 > df.height2) & (df.weight1 < df.weight2)).sum()

EDIT:

I test it a bit and then change conditions with custom function:

我测试了一下然后用自定义函数改变条件:

def f(x):
    #check boolean mask 
    #print ((df.height > x.height) & (df.weight < x.weight))
    return ((df.height < x.height) & (df.weight > x.weight)).sum()

df['new_column'] = df.apply(f, axis=1)
print (df)
   id   name  height  weight  country  new_column
0   0   Adam      70     180      USA           2
1   1   Bill      65     190   CANADA           1
2   2  Chris      71     150  GERMANY           3
3   3   Eric      72     210      USA           1
4   4   Fred      74     160   FRANCE           4
5   5   Gary      75     220   MEXICO           1
6   6  Henry      61     230    SPAIN           0

Explanation:

For each row compare values and for count simply sum values True.

对于每行比较值和计数,只需求和值True。

#2


1  

For example, if it was a dataframe describing people, and I wanted to make a column that counted how many people were taller than the current row and lighter.

例如,如果它是一个描述人的数据框,我想创建一个列,计算有多少人比当前行更轻,更轻。

As far as I understand, you want to assign to a new column something like

据我所知,你想要分配一个新的列

df['num_heigher_and_leighter'] = df.apply(lambda r: ((df.height > r.height) & (df.weight < r.weight)).sum(), axis=1)

However, your text description doesn't seem to match the outcome, which is:

但是,您的文字说明似乎与结果不符,即:

0    2
1    3
2    0
3    1
4    0
5    0
6    6
dtype: int64

Edit

As in any other case, you can use a named function instead of a lambda:

与任何其他情况一样,您可以使用命名函数而不是lambda:

df = ...

def foo(r):
    return ((df.height > r.height) & (df.weight < r.weight)).sum()

df['num_heigher_and_leighter'] = df.apply(foo, axis=1)

#3


0  

I'm assuming you had a typo and want to compare heights with heights and weights with weights. If so, you could count the number of persons taller OR heavier like so:

我假设你有一个错字,并希望将高度与权重和权重与权重进行比较。如果是这样,您可以计算更高或更重的人数,如下所示:

>>> for i,height,weight in zip(df.index,df.height, df.weight):
...     cnt = df.loc[((df.height>height) & (df.weight>weight)), 'height'].count()
...     df.loc[i,'thing'] = cnt
...
>>> df
    name  height  weight  country  thing
0   Adam      70     180      USA    2.0
1   Bill      65     190   CANADA    2.0
2  Chris      71     150  GERMANY    3.0
3   Eric      72     210      USA    1.0
4   Fred      74     160   FRANCE    1.0
5   Gary      75     220   MEXICO    0.0
6  Henry      61     230    SPAIN    0.0

Here for instance, no person is Heavier than Henry, and no person is taller than Gary. If that's not what you intended, it should be easy to modify the & above to a | instead or switching out the > to a <.

例如,没有人比亨利更重,没有人比加里更高。如果那不是您想要的,那么将&上面的&上面修改为|应该很容易相反或切换>到<。

When you're more accustomed to Pandas, I suggest you use Ami Tavory excellent answer instead.

当你更习惯于熊猫时,我建议你使用Ami Tavory的优秀答案。

PS. For the love of god, use the Metric system for representing weight and height, and convert to whatever for presentation. These numbers are totally nonsensical for the world population at large. :)

PS。对于上帝的爱,使用公制系统来表示重量和高度,并转换为任何呈现。这些数字对于世界人口来说完全没有意义。 :)