英语水平有限.....习题描述不准确的地方多担待
本人系机器学习初学者,可能有的理解不够到位,希望大家批评指正
1.习题描述
你有一个ex1data1.txt的数据集,数据集的第一列是城市的人口,第二列是在城市***利润,负值表示损失。请应用单变量线性回归来通过人口预测利润。
In this part of this exercise, you will implement linear regression with onevariable to predict prots for a food truck. Suppose you are the CEO of a
restaurant franchise and are considering dierent cities for opening a new
outlet. The chain already has trucks in various cities and you have data for
prots and populations from the cities.
3
You would like to use this data to help you select which city to expand
to next.
The le ex1data1.txt contains the dataset for our linear regression prob-
lem. The rst column is the population of a city and the second column is
the prot of a food truck in that city. A negative value for prot indicates a
loss.
The ex1.m script has already been set up to load this data for you.(原文) 吴恩达.机器学习.ex1
2.数学基础
我们希望通过theta1*x+theta0来拟合当前的数据,这就需要找到一种方法来确定theta1,theta0的值。机器学习的思路大概是这样。先随便来一个theta1,theta0,在本文中初始化为0。然后计算h:=theta1*x+theta0的值。得到预测值,然后通过计算损失函数,loss:=1/(2*m)*∑((y-h)^2),然后运用梯度下降的算法来调整theta1,theta0。theta1:=theta1-alpha*
,theta0同理。这里数学公式不太懂关系不是很大,tensorflow封装了这个梯度下降算法。然后通过循环不断调整theta的值就可以得到最终解。这里,为了将theta0与theta1用一个向量来表示,我们选择在原始数据的左边加一列1。
# -*- coding: utf-8 -*-import numpy as npimport matplotlib.pyplot as pltimport tensorflow as tfa=np.loadtxt('ex1data1.txt',delimiter=','); #载入txt数据x=a[:,0] #x为数据的第一列y=a[:,1] #y为数据的第二列y=y.reshape(97,1) #这一步是确定y的列数,不然y的列数为空m=y.shape[0] #确定行数temp=np.ones((m,1)) x=np.c_[temp,x] #在自变量左侧增加一列1,这样就不用b了theta=tf.Variable([[0.0],[0.0]]) #初始化theta的值,2行1列x=x.astype(np.float32) #使x与theta的变量类型一致h=tf.matmul(x,theta) #矩阵相乘 [97,2]*[2,1]=[97,1],即Wi*Xi+bloss=tf.reduce_mean(tf.square(h-y)/2) #计算损失值optimizer=tf.train.GradientDescentOptimizer(0.015) #运用梯度下降算法,学习率为0.015train=optimizer.minimize(loss) #最小化损失值init=tf.initialize_all_variables() #初始化tensorflow变量 sess=tf.Session() #打开sessionsess.run(init) #初始化for step in range(1000): #执行优化循环 sess.run(train) #训练 plt.plot(x[:,1],y,'rx','Markersize',10) #原始数据点标记 plt.plot(x[:,1],sess.run(h)) #绘制优化曲线 plt.show() print(sess.run(loss)) #打印当前损失值