以后遇到这段代码:
I was confused about the usage of both .apply
and lambda
. Firstly does .apply
apply the desired change to all elements in all the columns specified or each column one by one? Secondly, does x in lambda x:
iterate through every element in specified columns or columns separately? Thirdly, does x.min
or x.max
give us the minimum or maximum of all the elements in specified columns or minimum and maximum elements of each column separately? Any answer explaining the whole process would make me more than grateful.
Thanks.
我对。apply和lambda的用法感到困惑。首先。apply对指定的所有列中的所有元素或每个列中的每个列逐一应用所需的更改吗?第二,x在x中的值是x:分别遍历指定列或列中的每个元素吗?第三,x。分钟或者x。max分别给出指定列中所有元素的最小或最大值,或者每个列的最小和最大值元素?任何解释整个过程的回答都会让我万分感激。谢谢。
2 个解决方案
#1
1
I think here is the best avoid apply
- loops under the hood and working with subset of DataFrame
by columns from list
:
我认为这是最好的避免应用的方法——在引擎盖下循环,根据列表中的列使用DataFrame的子集:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
c = ['B','C','D']
So first select minimal values of selected columns and similar maximal:
因此,首先选择所选列的最小值和类似的最大值:
print (df[c].min())
B 4
C 2
D 0
dtype: int64
Then subtract and divide:
然后减法和除法:
print ((df[c] - df[c].min()))
B C D
0 0 5 1
1 1 6 3
2 0 7 5
3 1 2 7
4 1 0 1
5 0 1 0
print (df[c].max() - df[c].min())
B 1
C 7
D 7
dtype: int64
df[c] = (df[c] - df[c].min()) / (df[c].max() - df[c].min())
print (df)
A B C D E F
0 a 0.0 0.714286 0.142857 5 a
1 b 1.0 0.857143 0.428571 3 a
2 c 0.0 1.000000 0.714286 6 a
3 d 1.0 0.285714 1.000000 9 b
4 e 1.0 0.000000 0.142857 2 b
5 f 0.0 0.142857 0.000000 4 b
EDIT:
编辑:
For debug apply
is best create custom function:
对于调试应用,最好是创建自定义函数:
def f(x):
#for each loop return column
print (x)
#return scalar - min
print (x.min())
#return new Series - column
print ((x-x.min())/ (x.max() - x.min()))
return (x-x.min())/ (x.max() - x.min())
df[c] = df[c].apply(f)
print (df)
#2
1
Check if the data are really being normalised. Because x.min and x.max may simply take the min and max of a single value, hence no normalisation would occur.
检查数据是否真的被正常化了。因为x。最小值和x。max可能只是取单个值的最小值和最大值,因此不会发生规范化。
#1
1
I think here is the best avoid apply
- loops under the hood and working with subset of DataFrame
by columns from list
:
我认为这是最好的避免应用的方法——在引擎盖下循环,根据列表中的列使用DataFrame的子集:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
c = ['B','C','D']
So first select minimal values of selected columns and similar maximal:
因此,首先选择所选列的最小值和类似的最大值:
print (df[c].min())
B 4
C 2
D 0
dtype: int64
Then subtract and divide:
然后减法和除法:
print ((df[c] - df[c].min()))
B C D
0 0 5 1
1 1 6 3
2 0 7 5
3 1 2 7
4 1 0 1
5 0 1 0
print (df[c].max() - df[c].min())
B 1
C 7
D 7
dtype: int64
df[c] = (df[c] - df[c].min()) / (df[c].max() - df[c].min())
print (df)
A B C D E F
0 a 0.0 0.714286 0.142857 5 a
1 b 1.0 0.857143 0.428571 3 a
2 c 0.0 1.000000 0.714286 6 a
3 d 1.0 0.285714 1.000000 9 b
4 e 1.0 0.000000 0.142857 2 b
5 f 0.0 0.142857 0.000000 4 b
EDIT:
编辑:
For debug apply
is best create custom function:
对于调试应用,最好是创建自定义函数:
def f(x):
#for each loop return column
print (x)
#return scalar - min
print (x.min())
#return new Series - column
print ((x-x.min())/ (x.max() - x.min()))
return (x-x.min())/ (x.max() - x.min())
df[c] = df[c].apply(f)
print (df)
#2
1
Check if the data are really being normalised. Because x.min and x.max may simply take the min and max of a single value, hence no normalisation would occur.
检查数据是否真的被正常化了。因为x。最小值和x。max可能只是取单个值的最小值和最大值,因此不会发生规范化。