对.apply和lambda的用法感到困惑

时间:2022-08-24 21:02:29

After encountering this code: 对.apply和lambda的用法感到困惑

以后遇到这段代码:

I was confused about the usage of both .apply and lambda. Firstly does .apply apply the desired change to all elements in all the columns specified or each column one by one? Secondly, does x in lambda x: iterate through every element in specified columns or columns separately? Thirdly, does x.min or x.max give us the minimum or maximum of all the elements in specified columns or minimum and maximum elements of each column separately? Any answer explaining the whole process would make me more than grateful.
Thanks.

我对。apply和lambda的用法感到困惑。首先。apply对指定的所有列中的所有元素或每个列中的每个列逐一应用所需的更改吗?第二,x在x中的值是x:分别遍历指定列或列中的每个元素吗?第三,x。分钟或者x。max分别给出指定列中所有元素的最小或最大值,或者每个列的最小和最大值元素?任何解释整个过程的回答都会让我万分感激。谢谢。

2 个解决方案

#1


1  

I think here is the best avoid apply - loops under the hood and working with subset of DataFrame by columns from list:

我认为这是最好的避免应用的方法——在引擎盖下循环,根据列表中的列使用DataFrame的子集:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)

c = ['B','C','D']

So first select minimal values of selected columns and similar maximal:

因此,首先选择所选列的最小值和类似的最大值:

print (df[c].min())
B    4
C    2
D    0
dtype: int64

Then subtract and divide:

然后减法和除法:

print ((df[c] - df[c].min()))
   B  C  D
0  0  5  1
1  1  6  3
2  0  7  5
3  1  2  7
4  1  0  1
5  0  1  0

print (df[c].max() - df[c].min())
B    1
C    7
D    7
dtype: int64

df[c] = (df[c] - df[c].min()) / (df[c].max() - df[c].min())
print (df)
   A    B         C         D  E  F
0  a  0.0  0.714286  0.142857  5  a
1  b  1.0  0.857143  0.428571  3  a
2  c  0.0  1.000000  0.714286  6  a
3  d  1.0  0.285714  1.000000  9  b
4  e  1.0  0.000000  0.142857  2  b
5  f  0.0  0.142857  0.000000  4  b

EDIT:

编辑:

For debug apply is best create custom function:

对于调试应用,最好是创建自定义函数:

def f(x):
    #for each loop return column
    print (x)
    #return scalar - min
    print (x.min())
    #return new Series - column
    print ((x-x.min())/ (x.max() - x.min()))
    return (x-x.min())/ (x.max() - x.min())

df[c] = df[c].apply(f)
print (df)

#2


1  

Check if the data are really being normalised. Because x.min and x.max may simply take the min and max of a single value, hence no normalisation would occur.

检查数据是否真的被正常化了。因为x。最小值和x。max可能只是取单个值的最小值和最大值,因此不会发生规范化。

#1


1  

I think here is the best avoid apply - loops under the hood and working with subset of DataFrame by columns from list:

我认为这是最好的避免应用的方法——在引擎盖下循环,根据列表中的列使用DataFrame的子集:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)

c = ['B','C','D']

So first select minimal values of selected columns and similar maximal:

因此,首先选择所选列的最小值和类似的最大值:

print (df[c].min())
B    4
C    2
D    0
dtype: int64

Then subtract and divide:

然后减法和除法:

print ((df[c] - df[c].min()))
   B  C  D
0  0  5  1
1  1  6  3
2  0  7  5
3  1  2  7
4  1  0  1
5  0  1  0

print (df[c].max() - df[c].min())
B    1
C    7
D    7
dtype: int64

df[c] = (df[c] - df[c].min()) / (df[c].max() - df[c].min())
print (df)
   A    B         C         D  E  F
0  a  0.0  0.714286  0.142857  5  a
1  b  1.0  0.857143  0.428571  3  a
2  c  0.0  1.000000  0.714286  6  a
3  d  1.0  0.285714  1.000000  9  b
4  e  1.0  0.000000  0.142857  2  b
5  f  0.0  0.142857  0.000000  4  b

EDIT:

编辑:

For debug apply is best create custom function:

对于调试应用,最好是创建自定义函数:

def f(x):
    #for each loop return column
    print (x)
    #return scalar - min
    print (x.min())
    #return new Series - column
    print ((x-x.min())/ (x.max() - x.min()))
    return (x-x.min())/ (x.max() - x.min())

df[c] = df[c].apply(f)
print (df)

#2


1  

Check if the data are really being normalised. Because x.min and x.max may simply take the min and max of a single value, hence no normalisation would occur.

检查数据是否真的被正常化了。因为x。最小值和x。max可能只是取单个值的最小值和最大值,因此不会发生规范化。