I'm trying to implement Gradient Descent (GD) (not stochastic one) for logistic regression in Python 3x. And have some troubles.
我正在尝试在Python 3x中实现逻辑回归的梯度下降(GD)(非随机因子)。并有一些麻烦。
Logistic regression is defined as follows (1): logistic regression formula
Logistic回归定义如下(1):逻辑回归公式
Formulas for gradients are defined as follows (2): gradient descent for logistic regression
梯度公式定义如下(2):逻辑回归的梯度下降
Description of data:
数据描述:
- X is (Nx2)-matrix of objects (consist of positive and negative float numbers)
- y is (Nx1)-vector of class labels (-1 or +1)
X是(Nx2) - 对象的矩阵(由正浮点数和负浮点数组成)
y是(Nx1) - 类标签的向量(-1或+1)
Task: Implement gradient descent 1) with L2-regularization; and 2) without regularization. Desired results: vectors of weights. Parameters: regularization rate C=10 for regularized regression and C=0 for unregularized regression; gradient step k=0.1; max.number of iterations = 10000; tolerance = 1e-5. Note: GD is converged if distance between weighs vectors from current and previous steps is less than tolerance (1e-5).
任务:用L2正则化实现梯度下降1); 2)没有正规化。期望的结果:权重向量。参数:正则化回归的正则化率C = 10,非正则化回归的C = 0;梯度步长k = 0.1; max.number of iterations = 10000;公差= 1e-5。注意:如果来自当前步骤和前一步骤的称重矢量之间的距离小于容差(1e-5),则GD会聚。
Here is my implementation: k - gradient step; C - regularization rate.
这是我的实现:k - 梯度步骤; C - 正则化率。
import numpy as np
def sigmoid(z):
result = 1./(1. + np.exp(-z))
return result
def distance(vector1, vector2):
vector1 = np.array(vector1, dtype='f')
vector2 = np.array(vector2, dtype='f')
return np.linalg.norm(vector1-vector2)
def GD(X, y, C, k=0.1, tolerance=1e-5, max_iter=10000):
X = np.matrix(X)
y = np.matrix(y)
l=len(X)
w1, w2 = 0., 0. # weights (look formula (2) in the beginning of question)
difference = 1.
iteration = 1
while(difference > tolerance):
hypothesis = y*(X*np.matrix([w1, w2]).T)
w1_updated = w1 + (k/l)*np.sum(y*X[:,0]*(1.-(sigmoid(hypothesis)))) - k*C*w1
w2_updated = w2 + (k/l)*np.sum(y*X[:,1]*(1.-(sigmoid(hypothesis)))) - k*C*w2
difference = distance([w1, w2], [w1_updated, w2_updated])
w1, w2 = w1_updated, w2_updated
if(iteration >= max_iter):
break;
iteration = iteration + 1
return [w1_updated, w2_updated] #vector of weights
Respectively:
# call for UNregularized GD: C=0
w = GD(X, y, C=0., k=0.1)
and
# call for regularized GD: C=10
w_reg = GD(X, y, C=10., k=0.1)
Here are the resuls (weights-vectors):
这是结果(权重向量):
# UNregularized GD
[0.035736331265589463, 0.032464572442830832]
# regularized GD
[5.0979561973044096e-06, 4.6312243707352652e-06]
However, it should be (right answers for self-control):
但是,它应该是(自我控制的正确答案):
# UNregularized GD
[0.28801877, 0.09179177]
# regularized GD
[0.02855938, 0.02478083]
!!! Please, can you tell me whats going wrong here? I'm sitting with this problem for three days in a row and still have no idea.
!拜托,你能告诉我这里出了什么问题吗?我连续三天都遇到这个问题但仍然不知道。
Thank you in advance.
先感谢您。
1 个解决方案
#1
0
First of all, the sigmoid functions should be
首先,sigmoid函数应该是
def sigmoid(Z):
A=1/(1+np.exp(-Z))
return A
Try to run it again with this formula. Then, what is L?
尝试使用此公式再次运行它。那么,L是什么?
#1
0
First of all, the sigmoid functions should be
首先,sigmoid函数应该是
def sigmoid(Z):
A=1/(1+np.exp(-Z))
return A
Try to run it again with this formula. Then, what is L?
尝试使用此公式再次运行它。那么,L是什么?