为什么0-1损失函数难以处理

时间:2021-10-13 01:44:32

In Ian Goodfellow's deep learning book, it is written that

在Ian Goodfellow的深度学习书中,有人写道

Sometimes, the loss function we actually care about (say, classification error) is not one that can be optimized efficiently. For example, exactly minimizing expected 0-1 loss is typically intractable (exponential in the input dimension), even for a linear classifier. In such situations, one typically optimizes a surrogate loss function instead, which acts as a proxy but has advantages.

有时,我们实际关心的损失函数(例如,分类错误)不是可以有效优化的函数。例如,即使对于线性分类器,精确地最小化预期的0-1损失通常是难以处理的(在输入维度中呈指数)。在这种情况下,人们通常会优化代理损失函数,它充当代理但具有优势。

I do not understand why 0-1 loss is intractable or how is it exponential in the input dimensions?

我不明白为什么0-1损失是难以处理的,或者它在输入维度中是如何指数的?

1 个解决方案

#1


10  

The 0-1 loss function is non-convex and discontinuous, so (sub)gradient methods cannot be applied. For binary classification with a linear separator, this loss function can be formulated as finding the $\beta$ that minimizes the average value of the indicator function $\mathbf{1}(y_{i}\beta\mathbf{x}_{i} \leq 0)$ over all $i$ samples. This is exponential in the inputs, as since there are two possible values for each pair, there are $2^{n}$ possible configurations to check for $n$ total sample points. This is known to be NP-hard. Knowing the current value of your loss function doesn’t provide any clue as to how you should possibly modify your current solution to improve, as you could derive if gradient methods for convex or continuous functions were available.

0-1损失函数是非凸的和不连续的,因此不能应用(子)梯度方法。对于使用线性分隔符的二进制分类,此损失函数可以表示为找到最小化指标函数$ \ mathbf {1}(y_ {i} \ beta \ mathbf {x} _ {的平均值的$ \ beta $ i} \ leq 0)所有$ i $样本的$。这在输入中是指数的,因为每对有两个可能的值,有$ 2 ^ {n} $可能的配置来检查$ n $总样本点。众所周知,这是NP难的。了解损失函数的当前值并不能提供任何关于如何修改当前解决方案以改进的线索,因为如果可以使用凸函数或连续函数的梯度方法,则可以得出。

#1


10  

The 0-1 loss function is non-convex and discontinuous, so (sub)gradient methods cannot be applied. For binary classification with a linear separator, this loss function can be formulated as finding the $\beta$ that minimizes the average value of the indicator function $\mathbf{1}(y_{i}\beta\mathbf{x}_{i} \leq 0)$ over all $i$ samples. This is exponential in the inputs, as since there are two possible values for each pair, there are $2^{n}$ possible configurations to check for $n$ total sample points. This is known to be NP-hard. Knowing the current value of your loss function doesn’t provide any clue as to how you should possibly modify your current solution to improve, as you could derive if gradient methods for convex or continuous functions were available.

0-1损失函数是非凸的和不连续的,因此不能应用(子)梯度方法。对于使用线性分隔符的二进制分类,此损失函数可以表示为找到最小化指标函数$ \ mathbf {1}(y_ {i} \ beta \ mathbf {x} _ {的平均值的$ \ beta $ i} \ leq 0)所有$ i $样本的$。这在输入中是指数的,因为每对有两个可能的值,有$ 2 ^ {n} $可能的配置来检查$ n $总样本点。众所周知,这是NP难的。了解损失函数的当前值并不能提供任何关于如何修改当前解决方案以改进的线索,因为如果可以使用凸函数或连续函数的梯度方法,则可以得出。