早上来图书馆之后就开始准备训练,首先是照抄Andrew Ng的代码,包括sigmoid函数:
function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
% J = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g = zeros(size(z));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
% vector or scalar).
g = 1.0 ./ (exp(z .* (-1)) .+ 1.0)
% =============================================================
end
还有用于计算目标函数J及其梯度的costFunction函数:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
n = size(theta, 1); % number of dimensions
J = (1 / m) .* sum(-y' * log(sigmoid(X * theta)) - (1 - y') * log(1 .- sigmoid(X * theta))) + (lambda / (2 * m)) .* (sum(theta .^ 2) - theta(1) ^ 2);
grad = (1 / m) .* (X' * (sigmoid(X * theta) - y));
grad(1) = grad(1) - (lambda / m) * theta(1);
% =============================================================
end
还有训练完theta之后用来预测的predict函数:
function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
% p = PREDICT(theta, X) computes the predictions for X using a
% threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
m = size(X, 1); % Number of training examples
% You need to return the following variables correctly
p = zeros(m, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters.
% You should set p to a vector of 0's and 1's
%
p = sigmoid(X * theta);
for i = 1:m
if (p(i) > 0.5)
p(i) = 1;
else
p(i) = 0;
end
% =========================================================================
end
然后我自己还写了个求分类精度的accuracy函数:
function [accuracy] = accuracy(p, label)
accuracy = mean(double(p == label)) * 100;
end
一开始我用整个norm_1去训练,后来马上意识到这样做十分愚蠢,因为根本都不知道什么时候能训练完,于是我就从50条数据开始尝试,500条数据需要5分钟,而5000条数数据大概用了两个多小时:
octave> [theta, J, exit_flag] = fminunc(@(t)(costFunction(t, norm_1(1:5000, :), label_1(1:5000, :), 0.001)), initial_theta, options);
然后用得来的theta,对norm_2的数据统计了一下针对norm_2的精度竟然只有43.580:
octave> accuracy(predict(theta, norm_2), label_2);
ans = 43.580
我大概算了一下,如果是用硬币分类的话,就是50%分正例50%分反例。对于10%是正例的数据来说,正例一共分对10% * 50%,反例一共分对90% * 50%,那么硬币分类的精度应该是:
(10% * 50% + 90% * 50%) / 1 = 50%
然后就去LeaderBoard上面看看他们的情况,目前第一的是maternaj,他的score是0.62376,不过这个统计的是回归加分类的,而且评分的方法叫做MAE(Mean Absolute Error),计算方法是:
也就是score值越小排名越靠前。
在经历过打击之后,果断换SVM。在找SVM工具箱的时候发现了这个网站 :
http://www.support-vector-machines.org/SVM_soft.html
里面包含了一些公认的优秀的SVM实现。
记得当初Andrew Ng讲课的时候介绍的是LIBSVM,所以先试一下这款。
然后就开始训练:
load("~/norm_1.mat");load("~/label_1.mat");
model_1 = svmtrain(label_1, norm_1, "-s 2 -d 3 -b 0 -q 0")
谁知道一天都训练进去了,还是没有出结果。
下午终于醒悟,又没有小数据开始试,于是就又重算,发现问题很大,我在逐步训练的时候顺便把耗时,训练误差,测试误差记下来了:
;; train
octave> model_1 = svmtrain(label_1(1:50), norm_1(1:50, :), "-s 2 -d 3 -b 0 -q 0")
;; train error
octave> [predicted_label, accuracy, prob_estimates] = svmpredict(label_1(1:50), norm_1(1:50, :), model_1);
;; test error
octave> [predicted_label, accuracy, prob_estimates] = svmpredict(label_2, norm_2, model_1);
结果是这样的:
+-----------------------------------------------------+
| data amount | time | train accuracy | test accuracy |
+-----------------------------------------------------+
| 50 | 5s | 4% | 2.145% |
+-----------------------------------------------------+
| 100 | 15s | 6% | 2.54% |
+-----------------------------------------------------+
| 200 | 20s | 6.5% | 3.54% |
+-----------------------------------------------------+
| 400 | 1m | 5.25% | 3.275% |
+-----------------------------------------------------+
| 1000 | 4m | 5% | 4.225% |
+-----------------------------------------------------+
| 2000 | 7m | 4.9% | 4.44% |
+-----------------------------------------------------+
| 4000 | 18m | 4.15% | 4.31% |
+-----------------------------------------------------+
| 10000 | 1h48m| 4.5% | 4.37% |
+-----------------------------------------------------+
注意这里统计的是精确度,竟然连十位数都没达到,我先在感觉非常揪心,完全不知到发生了什么。。。
顺便记一下,训练的时候闲着没事,逛微博发现写《机器学习》的Mitchell还有个公开课视频,讲半监督的,打算有时间看一下:
http://videolectures.net/mlas06_mitchell_sla/