无效率的正则化逻辑回归与Numpy。

时间:2021-06-01 21:24:38

I am a machine learning noob attemping to implement regularized logistic regression via Newton's method.

我是一个机器学习noob试图通过牛顿的方法实现正规化的逻辑回归。

The data have two features which are supposed to be expanded to 28 through finding all monomial terms of (u,v) up to degree 6

该数据有两个特点,应该扩展到28,通过查找所有单项的(u,v)到6度。

My code converges to the correct solution of norm(theta)=0.9384 after around 500 or so iterations when it should only take around 15 for lambda = 10, though the exercise is based on Matlab instead of Python. Each cycle of the parameter update is also very slow with my code and I am not sure exactly why. If anyone could explain why my code takes so many iterations to converge and why each iteration is painfully slow I would be very grateful!

我的代码在大约500个左右的迭代后收敛到正确的标准(=0.9384),当它只需要大约15对= 10时,尽管这个练习是基于Matlab而不是Python。在我的代码中,参数更新的每个循环都非常慢,我不确定为什么。如果有人能够解释为什么我的代码需要如此多的迭代来收敛,为什么每次迭代都非常缓慢,我将非常感激!

The data are taken from Andrew Ng's open course exercise 5. The problem information and data can be found here http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex5/ex5.html although I posted the data and my code below.

这些数据来自于Andrew Ng的公开课练习5。这里的问题信息和数据可以在这里找到:http://openclassroom.stanford.edu/mainfolder/documentpage.php?

X data with two features

具有两个特性的X数据。

0.051267,0.69956
-0.092742,0.68494
-0.21371,0.69225
-0.375,0.50219
-0.51325,0.46564
-0.52477,0.2098
-0.39804,0.034357
-0.30588,-0.19225
0.016705,-0.40424
0.13191,-0.51389
0.38537,-0.56506
0.52938,-0.5212
0.63882,-0.24342
0.73675,-0.18494
0.54666,0.48757
0.322,0.5826
0.16647,0.53874
-0.046659,0.81652
-0.17339,0.69956
-0.47869,0.63377
-0.60541,0.59722
-0.62846,0.33406
-0.59389,0.005117
-0.42108,-0.27266
-0.11578,-0.39693
0.20104,-0.60161
0.46601,-0.53582
0.67339,-0.53582
-0.13882,0.54605
-0.29435,0.77997
-0.26555,0.96272
-0.16187,0.8019
-0.17339,0.64839
-0.28283,0.47295
-0.36348,0.31213
-0.30012,0.027047
-0.23675,-0.21418
-0.06394,-0.18494
0.062788,-0.16301
0.22984,-0.41155
0.2932,-0.2288
0.48329,-0.18494
0.64459,-0.14108
0.46025,0.012427
0.6273,0.15863
0.57546,0.26827
0.72523,0.44371
0.22408,0.52412
0.44297,0.67032
0.322,0.69225
0.13767,0.57529
-0.0063364,0.39985
-0.092742,0.55336
-0.20795,0.35599
-0.20795,0.17325
-0.43836,0.21711
-0.21947,-0.016813
-0.13882,-0.27266
0.18376,0.93348
0.22408,0.77997
0.29896,0.61915
0.50634,0.75804
0.61578,0.7288
0.60426,0.59722
0.76555,0.50219
0.92684,0.3633
0.82316,0.27558
0.96141,0.085526
0.93836,0.012427
0.86348,-0.082602
0.89804,-0.20687
0.85196,-0.36769
0.82892,-0.5212
0.79435,-0.55775
0.59274,-0.7405
0.51786,-0.5943
0.46601,-0.41886
0.35081,-0.57968
0.28744,-0.76974
0.085829,-0.75512
0.14919,-0.57968
-0.13306,-0.4481
-0.40956,-0.41155
-0.39228,-0.25804
-0.74366,-0.25804
-0.69758,0.041667
-0.75518,0.2902
-0.69758,0.68494
-0.4038,0.70687
-0.38076,0.91886
-0.50749,0.90424
-0.54781,0.70687
0.10311,0.77997
0.057028,0.91886
-0.10426,0.99196
-0.081221,1.1089
0.28744,1.087
0.39689,0.82383
0.63882,0.88962
0.82316,0.66301
0.67339,0.64108
1.0709,0.10015
-0.046659,-0.57968
-0.23675,-0.63816
-0.15035,-0.36769
-0.49021,-0.3019
-0.46717,-0.13377
-0.28859,-0.060673
-0.61118,-0.067982
-0.66302,-0.21418
-0.59965,-0.41886
-0.72638,-0.082602
-0.83007,0.31213
-0.72062,0.53874
-0.59389,0.49488
-0.48445,0.99927
-0.0063364,0.99927

Y data

Y数据

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

My code below:

我的代码如下:

import pandas as pd
import numpy as np
import math

def sigmoid(theta, x):

    return 1/(1 + math.exp(-1*theta.T.dot(x)))


def cost_function(X, y, theta):

    s = 0
    for i in range(m):
        loss = -y[i]*np.log(sigmoid(theta, X[i])) - (1-y[i])*np.log(1-sigmoid(theta, X[i]))
        s += loss
    s /= m
    s += (lamb/(2*m))*sum(theta[j]**2 for j in range(1, 28)) 
    return s


def gradient(theta, X, y):
    # add regularization terms
    add_column = theta * (lamb/m)
    add_column[0] = 0
    a = sum((sigmoid(theta, X[i]) - y[i])*X[i] + add_column for i in range(m))/m
    return a


def hessian(theta, X, reg_matrix):

    matrix = []
    for i in range(28):
        row = []
        for j in range(28):
            cell = sum(sigmoid(theta, X[k])*(1-sigmoid(theta, X[k]))*X[k][i]*X[k][j] for k in range(m))
            row.append(cell)
        matrix.append(row)

    H = np.array(matrix)
    H = np.add(H, reg_matrix)
    return H


def newtons_method(theta, iterations):

    for i in range(iterations):
        g = gradient(theta, X, y)
        H = hessian(theta, X, reg_matrix)
        theta = theta - np.linalg.inv(H).dot(g)
        cost = cost_function(X,y,theta)
        print(cost)    
    return theta

def map_feature(u, v): # expand features according to problem instructions

    new_row = [] 
    new_row.append(1)
    new_row.append(u)
    new_row.append(v)
    new_row.append(u**2)
    new_row.append(u*v)
    new_row.append(v**2)
    new_row.append(u**3)
    new_row.append(u**2*v)
    new_row.append(u*v**2)
    new_row.append(v**3)
    new_row.append(u**4)
    new_row.append(u**3*v)
    new_row.append(u*v**3)
    new_row.append(v**4)
    new_row.append(u**2*v**2)
    new_row.append(u**5)
    new_row.append(u**4*v)
    new_row.append(u*v**4)
    new_row.append(v**5)
    new_row.append(u**2*v**3)
    new_row.append(u**3*v**2)
    new_row.append(u**6)
    new_row.append(u**5*v)
    new_row.append(u*v**5)
    new_row.append(v**6)
    new_row.append(u**4*v**2)
    new_row.append(u**2*v**4)
    new_row.append(u**3*v**3)
    return np.array(new_row)

with open('ex5Logx.dat', 'r') as f:
    array = []
    for line in f.readlines():
        array.append(line.strip().split(','))

    for a in array:

        a[0], a[1] = float(a[0]), float(a[1].strip())

    xdata= np.array(array)

with open('ex5Logy.dat', 'r') as f:
    array = []
    for line in f.readlines():
        array.append(line.strip())

    for i in range(len(array)):
        array[i] = float(array[i])
    ydata= np.array(array)


X_df = pd.DataFrame(xdata, columns=['score1', 'score2'])

y_df = pd.DataFrame(ydata, columns=['acceptence'])

m = len(y_df)

iterations = 15

ones = np.ones((m,1)) # intercept term in first column
X = np.array(X_df)
X = np.append(ones, X, axis=1)
y = np.array(y_df).flatten()

new_X = [] # prepare new array for expanded features
for i in range(m):
    new_row = map_feature(X[i][1], X[i][2])

    new_X.append(new_row)

X = np.array(new_X)

theta = np.array([0 for i in range(28)]) # initialize parameters to 0

lamb = 10 # lambda constant for regularization

reg_matrix = np.zeros((28,28),dtype=int) # n+1*n+1 regularization matrix 
np.fill_diagonal(reg_matrix, 1)
reg_matrix[0] = 0
reg_matrix = (lamb/m)*reg_matrix

theta = newtons_method(theta, iterations)
print(np.linalg.norm(theta))

1 个解决方案

#1


1  

I am not 100% sure but i went through one tutorial on Logistic Regression using Newton's method(http://thelaziestprogrammer.com/sharrington/math-of-machine-learning/solving-logreg-newtons-method) and it's implementation of Newton's method is little different from yours.Actually there is one major difference. In Newton's method it's adding product of inv of hessian and gradient to theta whereas you are subtracting. I know about logistic regression normal way not using newton's method. Apart from that it seems that you are using loops in Cost function and Hessian which i think can be done with one statement in numpy than looping.

我不是百分之百确定,但是我使用了牛顿的方法(http://thelaziestprogrammer.com/sharrington/math-of-机器- learning/- logreg-newtons-method)进行了一个关于逻辑回归的教程,它的实现与你的方法没什么不同。实际上有一个主要的区别。在牛顿法中,它增加了hessian和梯度的乘积,而你是在减法。我知道逻辑回归法不使用牛顿法。除此之外,您似乎还在使用成本函数和Hessian的循环,我认为这可以用一个语句来做,而不是循环。

I would suggest refer to attached link which i gave as it has done all implementation in python numpy and there are no loops. Loops which you have created are impacting performance.

我建议参考附件中的链接,因为它已经在python numpy中完成了所有的实现,并且没有循环。您创建的循环会影响性能。

#1


1  

I am not 100% sure but i went through one tutorial on Logistic Regression using Newton's method(http://thelaziestprogrammer.com/sharrington/math-of-machine-learning/solving-logreg-newtons-method) and it's implementation of Newton's method is little different from yours.Actually there is one major difference. In Newton's method it's adding product of inv of hessian and gradient to theta whereas you are subtracting. I know about logistic regression normal way not using newton's method. Apart from that it seems that you are using loops in Cost function and Hessian which i think can be done with one statement in numpy than looping.

我不是百分之百确定,但是我使用了牛顿的方法(http://thelaziestprogrammer.com/sharrington/math-of-机器- learning/- logreg-newtons-method)进行了一个关于逻辑回归的教程,它的实现与你的方法没什么不同。实际上有一个主要的区别。在牛顿法中,它增加了hessian和梯度的乘积,而你是在减法。我知道逻辑回归法不使用牛顿法。除此之外,您似乎还在使用成本函数和Hessian的循环,我认为这可以用一个语句来做,而不是循环。

I would suggest refer to attached link which i gave as it has done all implementation in python numpy and there are no loops. Loops which you have created are impacting performance.

我建议参考附件中的链接,因为它已经在python numpy中完成了所有的实现,并且没有循环。您创建的循环会影响性能。