如下所示:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
"""Computes softmax activations.
This function performs the equivalent of
softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
Args:
logits: A non-empty `Tensor`. Must be one of the following types: `half`,
`float32`, `float64`.
axis: The dimension softmax would be performed on. The default is -1 which
indicates the last dimension.
name: A name for the operation (optional).
dim: Deprecated alias for `axis`.
Returns:
A `Tensor`. Has the same type and shape as `logits`.
Raises:
InvalidArgumentError: if `logits` is empty or `axis` is beyond the last
dimension of `logits`.
"""
axis = deprecation.deprecated_argument_lookup( "axis" , axis, "dim" , dim)
if axis is None :
axis = - 1
return _softmax(logits, gen_nn_ops.softmax, axis, name)
|
softmax函数的返回结果和输入的tensor有相同的shape,既然没有改变tensor的形状,那么softmax究竟对tensor做了什么?
答案就是softmax会以某一个轴的下标为索引,对这一轴上其他维度的值进行 激活 + 归一化处理。
一般来说,这个索引轴都是表示类别的那个维度(tf.nn.softmax中默认为axis=-1,也就是最后一个维度)
举例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
def softmax(X, theta = 1.0 , axis = None ):
"""
Compute the softmax of each element along an axis of X.
Parameters
----------
X: ND-Array. Probably should be floats.
theta (optional): float parameter, used as a multiplier
prior to exponentiation. Default = 1.0
axis (optional): axis to compute values along. Default is the
first non-singleton axis.
Returns an array the same size as X. The result will sum to 1
along the specified axis.
"""
# make X at least 2d
y = np.atleast_2d(X)
# find axis
if axis is None :
axis = next (j[ 0 ] for j in enumerate (y.shape) if j[ 1 ] > 1 )
# multiply y against the theta parameter,
y = y * float (theta)
# subtract the max for numerical stability
y = y - np.expand_dims(np. max (y, axis = axis), axis)
# exponentiate y
y = np.exp(y)
# take the sum along the specified axis
ax_sum = np.expand_dims(np. sum (y, axis = axis), axis)
# finally: divide elementwise
p = y / ax_sum
# flatten if X was 1D
if len (X.shape) = = 1 : p = p.flatten()
return p
c = np.random.randn( 2 , 3 )
print (c)
# 假设第0维是类别,一共有里两种类别
cc = softmax(c,axis = 0 )
# 假设最后一维是类别,一共有3种类别
ccc = softmax(c,axis = - 1 )
print (cc)
print (ccc)
|
结果:
1
2
3
4
5
6
7
8
9
|
c:
[[ - 1.30022268 0.59127472 1.21384177 ]
[ 0.1981082 - 0.83686108 - 1.54785864 ]]
cc:
[[ 0.1826746 0.80661068 0.94057075 ]
[ 0.8173254 0.19338932 0.05942925 ]]
ccc:
[[ 0.0500392 0.33172426 0.61823654 ]
[ 0.65371718 0.23222472 0.1140581 ]]
|
可以看到,对axis=0的轴做softmax时,输出结果在axis=0轴上和为1(eg: 0.1826746+0.8173254),同理在axis=1轴上做的话结果的axis=1轴和也为1(eg: 0.0500392+0.33172426+0.61823654)。
这些值是怎么得到的呢?
以cc为例(沿着axis=0做softmax):
以ccc为例(沿着axis=1做softmax):
知道了计算方法,现在我们再来讨论一下这些值的实际意义:
cc[0,0]实际上表示这样一种概率: P( label = 0 | value = [-1.30022268 0.1981082] = c[*,0] ) = 0.1826746
cc[1,0]实际上表示这样一种概率: P( label = 1 | value = [-1.30022268 0.1981082] = c[*,0] ) = 0.8173254
ccc[0,0]实际上表示这样一种概率: P( label = 0 | value = [-1.30022268 0.59127472 1.21384177] = c[0]) = 0.0500392
ccc[0,1]实际上表示这样一种概率: P( label = 1 | value = [-1.30022268 0.59127472 1.21384177] = c[0]) = 0.33172426
ccc[0,2]实际上表示这样一种概率: P( label = 2 | value = [-1.30022268 0.59127472 1.21384177] = c[0]) = 0.61823654
将他们扩展到更多维的情况:假设c是一个[batch_size , timesteps, categories]的三维tensor
output = tf.nn.softmax(c,axis=-1)
那么 output[1, 2, 3] 则表示 P(label =3 | value = c[1,2] )
以上这篇关于tensorflow softmax函数用法解析就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/zongza/article/details/88016668