PyTorch - 损失函数

文章目录

L1 范数损失

实际上就是曼哈顿距离

import torch 
import torch.nn as nn

x = torch.Tensor([1, 2, 3])
target = torch.Tensor([2, 2, 4])
criterion = nn.L1Loss()
loss = criterion(x, target)
loss   #  tensor(0.6667)

计算过程

(|1-2| + |2-2| + |3-4|)/3 = 0.6667

SmootchL1Loss

SmootchL1Loss 是L1 范数损失的变形。
在绝对值小于1的情况下，计算均方误差；在>= 1的情况下，减去 0.5。

公式：
$\begin{equation} Loss(x, y) = \frac{1}{N}\begin{cases} \frac{1}{2}(x_i, y_i)^2, |x_i - y _i|<1 \\ |x_i - y _i| - 0.5, 其它 \end{cases} \end{equation}$

代码实现

x = torch.Tensor([1, 2, 3])
target = torch.Tensor([2, 2, 4])
criterion = nn.SmoothL1Loss()
loss = criterion(x, target)
loss # tensor(0.3333)

均方误差损失

反应估计量与被估计量之间的差异程度

公式：
$\frac{1}{N} \sum^N_{i=1} |x - y|^2$

代码实现：

x = torch.Tensor([1, 2, 3])
target = torch.Tensor([2, 2, 4])
criterion = nn.MSELoss()
loss = criterion(x, target)
loss   #  tensor(0.6667)

二分类交叉熵损失

公式：
$-\frac{1}{N} \sum^N_{i=1} [ t_i * log(o_i) + (1-t_i) * log(1-o_i) ]$

t 为标签值
o 为预测的输出值
取反是为了使整体损失为正

代码实现：

x = torch.Tensor([0, 0, 1, 0, 1, 0])
target = torch.Tensor([0.3, 0.2, 0.8, 0.5, 0.7, 0.2])
criterion = nn.BCELoss()
loss = criterion(target, x)
loss   # tensor(0.3460)

计算过程：

import math
sum_loss = (1-0) * math.log(1-0.3) + (1-0) * math.log(1-0.2) + math.log(0.8) + (1-0) * math.log(1-0.5) + math.log(0.7) + (1-0) * math.log(1-0.2) 
loss = (-sum_loss)/6
loss # 0.34598795373000657

二分类交叉熵损失有一个变形，叫做 BCEWithLogitsLoss，其将 Sigmoid 集成进来。
与单独使用 Sigmoid 和 BCELoss 相比，BCEWithLogitsLoss 在数值上更稳定。

CrossEntropyLoss 和 NLLLoss 计算交叉熵损失

predict = torch.Tensor([[0.1, 0.5, 0.4], [0.1, 0.6, 0.1]])
label = torch.LongTensor([1, 2])
loss = nn.CrossEntropyLoss(reduction='none')
loss(predict, label) # tensor([0.9459, 1.2944]) 

loss = nn.CrossEntropyLoss(reduction='mean')
loss(predict, label) # tensor(1.1201)
 
loss = nn.CrossEntropyLoss(reduction='sum')
loss(predict, label) # tensor(2.2403)

CrossEntropyLoss 可以分解为 softmax，log，NLLLoss

import torch.nn.functional as F 

predict = torch.Tensor([[0.1, 0.5, 0.4], [0.1, 0.6, 0.1]])
label = torch.LongTensor([1, 2])

softmax = torch.softmax(predict, dim=1)
print('softmax : ', softmax)

_log = torch.log(softmax)
print('log : ', _log)

nll_loss = F.nll_loss(_log, label)
print('nll_loss : ', nll_loss)

'''
    softmax :  tensor([[0.2603, 0.3883, 0.3514],
            [0.2741, 0.4519, 0.2741]])
    log :  tensor([[-1.3459, -0.9459, -1.0459],
            [-1.2944, -0.7944, -1.2944]])
    nll_loss :  tensor(1.1201)
'''

KL 散度损失

又叫做相对熵，计算两个分布之间的距离；分布越相似，KL 散度越接近于0。

公式
$D_{kl}(p|q) = \sum^N_{i=1} p(x_i) log(\frac{p(x_i)}{q(x_i)})$

$p(x_i)$ 是真实分布对应的概率
$q(x_i)$ 是预测输出分布对应的概率；
KL散度衡量的是预测分布和真实分布偏离的程度，如果两个分布完全匹配，该值为0。

代码实现

predict = torch.Tensor([0.1, 0.3, 0.6])
label = torch.LongTensor([0.1, 0.6, 0.3])
loss = nn.KLDivLoss()
loss(predict, label) # tensor(0.)

余弦相似度损失

结果和向量长度无关，只与指向有关
通常用于正空间，因此给出的值为 0–1

公式
$\begin{equation} Loss(x, y) = \begin{cases} 1-cos(x, y), label=1 \\ max(0, cos(x,y) + margin), label=-1 \end{cases} \end{equation}$

x = torch.Tensor([[0.1, 0.5, 0.4], [0.1, 0.5, 0.4]])
y = torch.Tensor([[0.5, 0.4, 0.1], [0.1, 0.5, 0.4]])

label = torch.Tensor([-1, 1])
loss = nn.CosineEmbeddingLoss()
loss(x, y, label) # tensor(0.3452)
 
torch.cosine_similarity(x, y)
# tensor([0.6905, 1.0000])

多分类多标签损失

如一条裙子可以是长裙短裙连衣裙，也可以是百褶裙、A字裙

公式：
$\frac{1}{N} \sum_{i=1;i!=y_i}^{N} \sum_{j=1}^{y_j!=0} [max(0,1 - (x_{y_i} - x_i) )]$

x,y 都是大小为N的向量,限制 y 的大小为N，是为了处理多标签中标签个数不同的情况。
-1 代表占位符，后面的标签都是错误的标签。

loss = torch.nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8, 1.1, 4, 7]])
y = torch.LongTensor([[5, 4, 3, 0, -1, 1, 2]])

loss(x, y) #  tensor(4.2571)

2023-11-14（六）

秒客网