本文细述上文引出的RAECost和SoftmaxCost两个类。
SoftmaxCost
我们已经知道,SoftmaxCost类在给定features和label的情况下(超参数给定),衡量给定权重($hidden\times catSize$)的误差值$cost$,并指出当前的权重梯度。看代码。
@Override
public double valueAt(double[] x)
{
if( !requiresEvaluation(x) )
return value;
int numDataItems = Features.columns; int[] requiredRows = ArraysHelper.makeArray(0, CatSize-2);
ClassifierTheta Theta = new ClassifierTheta(x,FeatureLength,CatSize);
DoubleMatrix Prediction = getPredictions (Theta, Features); double MeanTerm = 1.0 / (double) numDataItems;
double Cost = getLoss (Prediction, Labels).sum() * MeanTerm;
double RegularisationTerm = 0.5 * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W); DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
DoubleMatrix Delta = Features.mmul(Diff.transpose()); DoubleMatrix gradW = Delta.getColumns(requiredRows);
DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows)); //Regularizing. Bias does not have one.
gradW = gradW.addi(Theta.W.mul(Lambda)); Gradient = new ClassifierTheta(gradW,gradb);
value = Cost + RegularisationTerm;
gradient = Gradient.Theta;
return value;
} public DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)
{
int numDataItems = Features.columns;
DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);
Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(1,numDataItems));
return Activation.valueAt(Input);
}
是个典型的2层神经网络,没有隐层,首先根据features预测labels,预测结果用softmax归一化,然后根据误差反向传播算出权重梯度。
此处增加200字。
这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每个label的概率。
计算cost的函数为getLoss,假设目标label的预测输出为$p^*$,则每个样本的cost也即误差函数为:
$$cost=E(p^*)=-\log(p^*)$$
根据前述的神经网络后向传播算法,我们得到($j$为目标label时,否则为0):
$$\frac{\partial E}{\partial w_{ij}}=\frac{\partial E}{\partial p_j}\frac{\partial h_j}{\partial net_j}x_i=-\frac{1}{p_j}p_j(1-p_j)x_i=-(1-p_j)x_i=-(label_j-p_j)feature_i$$
因此我们便理解了下面代码的含义:
DoubleMatrix Delta = Features.mmul(Diff.transpose());
RAECost
先看实现代码:
@Override
public double valueAt(double[] x)
{
if(!requiresEvaluation(x))
return value; Theta Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength);
FineTunableTheta Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
Theta2.setWe( Theta2.We.add(WeOrig) ); final RAEClassificationCost classificationCost = new RAEClassificationCost(
catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
final RAEFeatureCost featureCost = new RAEFeatureCost(
AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1); Parallel.For(DataCell,
new Parallel.Operation<LabeledDatum<Integer,Integer>>() {
public void perform(int index, LabeledDatum<Integer,Integer> Data)
{
try {
LabeledRAETree Tree = featureCost.Compute(Data);
classificationCost.Compute(Data, Tree);
} catch (Exception e) {
System.err.println(e.getMessage());
}
}
}); double costRAE = featureCost.getCost();
double[] gradRAE = featureCost.getGradient().clone(); double costSUP = classificationCost.getCost();
gradient = classificationCost.getGradient(); value = costRAE + costSUP;
for(int i=0; i<gradRAE.length; i++)
gradient[i] += gradRAE[i]; System.gc(); System.gc();
System.gc(); System.gc();
System.gc(); System.gc();
System.gc(); System.gc(); return value;
}
cost由两部分组成,featureCost和classificationCost。程序遍历每个样本,用featureCost.Compute(Data)生成一个递归树,同时累加cost和gradient,然后用classificationCost.Compute(Data, Tree)根据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。
RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树,然后调用BackPropagate计算梯度并累加。具体的算法过程,下一章分解。