吴恩达深度学习第一课第三周课后作业

时间:2022-04-14 03:06:07

第三周作业,对于作业环境安装不知道的可以看一下上一篇文章:
http://blog.csdn.net/liuzhongkai123/article/details/78766351
这一周把文档也考过来了。

Planar data classification with one hidden layer

Welcome to your week 3 programming assignment. It’s time to build your first neural network, which will have a hidden layer. You will see a big difference between this model and the one you implemented using logistic regression.
You will learn how to:
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh
- Compute the cross entropy loss
- Implement forward and backward propagation

1 - Packages

Let’s first import all the packages that you will need during this assignment.
- numpy is the fundamental package for scientific computing with Python.
- sklearn provides simple and efficient tools for data mining and data analysis. #数据挖掘和数据分析
- matplotlib is a library for plotting graphs in Python. #绘图
- testCases provides some test examples to assess the correctness of your functions
- planar_utils provide various useful functions used in this assignment
导入本周课程作业用到的包和模块

import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets

2 - Dataset

First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.

使用load_planar_dataset()获得用的数据集,返回X表示坐标点半径和角度,Y表示颜色red or blue

def load_planar_dataset():
    np.random.seed(1)
    m = 400 # number of examples
    N = int(m/2) # number of points per class,分为两类,每类是红或蓝
    D = 2 # dimensionality二维
    X = np.zeros((m,D)) # data matrix where each row is a single example
    Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
    a = 4 # maximum ray of the flower 花的最大值
    #linspace以指定的时间间隔返回均匀间隔的数字。
    for j in range(2):
        ix = range(N*j,N*(j+1))#ix=(0,199)(200,399)
        t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta角度,产生200个角度并加入随机数,保证角度随机分开,图像开起来稀疏程度不一
        r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius半径,4sin(4*t),并加入一定的随机,图像轨道不平滑
        X[ix] = np.c_[r*np.sin(t), r*np.cos(t)] #生成坐标点
        Y[ix] = j #red or blue

    X = X.T
    Y = Y.T

    return X, Y

数据集生成点云图:

plt.scatter(X[0, :], X[1, :], c=np.squeeze(Y), s=40, cmap=plt.cm.Spectral)#此处要squeeze一下,否则可能报错

结果:
吴恩达深度学习第一课第三周课后作业
You have:
- a numpy-array (matrix) X that contains your features (x1, x2)
- a numpy-array (vector) Y that contains your labels (red:0, blue:1).

Lets first get a better sense of what our data is like.

Exercise: How many training examples do you have? In addition, what is the shape of the variables X and Y?

Hint: How do you get the shape of a numpy array? (help)

### START CODE HERE ### (≈ 3 lines of code)
shape_X=np.shape(X)
shape_Y=np.shape(Y)
m=np.shape(X[0,:])
### END CODE HERE ###
print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have m = %d training examples!' % (m))

结果:

The shape of X is: (2, 400)
The shape of Y is: (1, 400)
I have m = 400 training examples!

3 - Simple Logistic Regression

Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.

# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV()
# clf.fit(X.T, Y.T)
clf.fit(X.T, Y.T.ravel())**#将多维数组降位一维**

You can now plot the decision boundary of these models. Run the code below.

# Plot the decision boundary for logistic regression
**#使用模块函数把分类器画出来,一条直线分为的两个部分。**
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")

# Print accuracy
LR_predictions = clf.predict(X.T)**#得到预测值Y_hat,标签**
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) +
       '% ' + "(percentage of correctly labelled datapoints)")#Y*Y_hat+(1-Y)*(1-Y_hat)#**看预测和真实匹配程度**

结果:

Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

吴恩达深度学习第一课第三周课后作业
可以看到逻辑回归分类准确率很低,无法正确分类。

4 - Neural Network model

Logistic regression did not work well on the “flower dataset”. You are going to train a Neural Network with a single hidden layer.

Here is our model:
吴恩达深度学习第一课第三周课后作业
Mathematically:
吴恩达深度学习第一课第三周课后作业
Given the predictions on all the examples, you can also compute the cost J as follows:
吴恩达深度学习第一课第三周课后作业
Reminder: The general methodology to build a Neural Network is to:
1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model’s parameters
3. Loop:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you’ve built nn_model() and learnt the right parameters, you can make predictions on new data.
强调了模型函数的重要性,当你完成了结构创建、参数初始化、计算了损失函数和梯度等数值要把他们集合到一个函数中,便于后续新的数据使用。

4.1 - Defining the neural network structure

Exercise: Define three variables:
- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer

Hint: Use shapes of X and Y to find n_x and n_y. Also, hard code the hidden layer size to be 4.

def layer_sizes(X,Y):
    n_x=X.shape[0]
    n_y=Y.shape[0]
    n_h=4
    return (n_x,n_h,n_y)

4.2 - Initialize the model‘s parameters

Exercise: Implement the function initialize_parameters().

Instructions:
- Make sure your parameters’ sizes are right. Refer to the neural network figure above if needed.
- You will initialize the weights matrices with random values.
- Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros.
- Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.

def initialize_parameters(n_x,n_h,n_y):
    np.random.seed(2)
    W1=np.random.randn(n_h,n_x)*0.01
    b1=np.zeros((n_h,1))
    W2=np.random.randn(n_y,n_h)*0.01
    b2=np.zeros((n_y,1))

    assert(W1.shape==(n_h,n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))

    parameters={'W1':W1,
                'b1':b1,
                'W2':W2,
                'b2':b2}
    return parameters

字典输出参数

4.3 - The Loop

Question: Implement forward_propagation().

Instructions:
- Look above at the mathematical representation of your classifier.
- You can use the function sigmoid(). It is built-in (imported) in the notebook.
- You can use the function np.tanh(). It is part of the numpy library.
- The steps you have to implement are:
1. Retrieve each parameter from the dictionary “parameters” (which is the output of initialize_parameters()) by using parameters[“..”].
2. Implement Forward Propagation. Compute Z[1],A[1],Z[2] and A[2] (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in “cache“. The cache will be given as an input to the backpropagation function.
定义前向传播函数:

def forward_propagation(X,parameters):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    Z1=np.dot(W1,X)+b1
    A1=np.tanh(Z1)
    Z2=np.dot(W2,A1)+b2
    A2=sigmoid(Z2)

    assert(A2.shape==(1,X.shape[1]))
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

Now that you have computed A[2] (in the Python variable “A2“), which contains a[2](i) for every example, you can compute the cost function as follows:
计算损失函数:

J=1mi=0m(y(i)log(a[2](i))+(1y(i))log(1a[2](i)))

Exercise: Implement compute_cost() to compute the value of the cost J.

def compute_cost(A2,Y,parameters):
    m=Y.shape[1]
    cost=-1/m*np.sum(np.multiply(Y,np.log(A2))+np.multiply((1-Y),np.log(1-A2)))

    cost=np.squeeze(cost)# makes sure cost is the dimension we expect. 
                        # E.g., turns [[17]] into 17 

    assert(isinstance(cost,float))
    return cost

Question: Implement the function backward_propagation().
**神经网络计算的核心代码,主要是根据链式求导法则得到dW,db的表达式(不要忘了除m)得到梯度。
这里用的激励函数是tanh,如果使用sigmoid得到的w会有不同,课程推荐使用tanh。**

def backward_propagation(parameters,cache,X,Y):
    m = X.shape[1]

    W1=parameters['W1']#n_h,n_x
    W2=parameters['W2']#n_y,n_h
    b1=parameters['b1']#n_h,1
    b2=parameters['b2']#n_y,1

    A1=cache['A1']#n_h,m
    A2=cache['A2']#n_y,m
    Z1=cache['Z1']#n_h,m
    Z2=cache['Z2']#n_y,m

    dZ2=A2-Y#n_y,m
    dW2=np.dot(dZ2,A1.T)/m#n_y,n_h
    db2=np.sum(dZ2,axis=1,keepdims=True)/m#n_y,1
    dZ1=np.multiply(np.dot(W2.T,dZ2),(1-A1**2))#n_h,m
    dW1=np.dot(dZ1,X.T)/m#n_h,n_x
    db1=np.sum(dZ1,axis=1,keepdims=True)/m#n_h,1

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads

测试:

parameters, cache, X_assess, Y_assess = backward_propagation_test_case()

grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1 = "+ str(grads["dW1"])) print ("db1 = "+ str(grads["db1"])) print ("dW2 = "+ str(grads["dW2"])) print ("db2 = "+ str(grads["db2"])) 

测试结果(tanh):

dW1 = [[ 0.01018708 -0.00708701] [ 0.00873447 -0.0060768 ] [-0.00530847 0.00369379] [-0.02206365 0.01535126]]
db1 = [[-0.00069728] [-0.00060606] [ 0.000364 ] [ 0.00151207]]
dW2 = [[ 0.00363613 0.03153604 0.01162914 -0.01318316]]
db2 = [[ 0.06589489]]

Question: Implement the update rule. Use gradient descent. You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).
General gradient descent rule: θ=θαJθ
更新权重参数

def update_parameters(parameters,grads,learning_rate=1.2):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]

    ###梯度更新,每迭代一次更新一次###
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

构建模型,把前面的综合到一个模型里面
Question: Build your neural network model in nn_model().

Instructions: The neural network model has to use the previous functions in the right order.

def nn_model(X,Y,n_h,num_iterations=10000,print_cost=False):
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]

    parameters=initialize_parameters(n_x,n_h,n_y)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    for i in range(0,num_iterations):

        A2,cache=forward_propagation(X,parameters)#前向传播节点
        cost = compute_cost(A2, Y, parameters)#计算损失函数
        grads=backward_propagation(parameters,cache,X,Y)#计算后向传播梯度
        parameters=update_parameters(parameters,grads,learning_rate=1.2)#使用梯度更新W,b一次

        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

测试:

X_assess, Y_assess = nn_model_test_case()

parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=False)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

结果:

W1 = [[ -7.13991779 9.27778317] [-12.59245311 2.48423279] [ -7.13853315 9.27827144] [ 7.12809846 -9.27653998]]
b1 = [[ 3.84586106] [ 6.33161315] [ 3.84613897] [-3.84590913]]
W2 = [[-3040.94994726 -2997.63395067 -3040.57705459 3016.53185899]]
b2 = [[-21.48466572]]

4.5 Predictions

Question: Use your model to predict by building predict().
Use forward propagation to predict results.
预测函数:

def predict(parameters,X):

    A2, cache = forward_propagation(X, parameters)
    predictions = (A2 > 0.5)

    return predictions

测试:

parameters, X_assess = predict_test_case()

predictions = predict(parameters, X_assess)
print("predictions mean = " + str(np.mean(predictions)))

结果:

predictions mean = 0.666666666667

实际点云集测试:

parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y) plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219550
Cost after iteration 9000: 0.218633

Out[52]:

Text(0.5,1,'Decision Boundary for hidden layer size 4')

吴恩达深度学习第一课第三周课后作业

测试一下准确率:

predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

结果90%已经是很高的准确率了:

Accuracy: 90%

4.6 - Tuning hidden layer size (optional/ungraded exercise)

下面是测试不同的隐藏层对准确度的影响:
Run the following code. It may take 1-2 minutes. You will observe different behaviors of the model for various hidden layer sizes.

plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20]#不同的隐藏层节点数
for i, n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer of size %d' % n_h)
    parameters = nn_model(X, Y, n_h, num_iterations = 5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
    print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))

结果:

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 10 hidden units: 90.25 %
Accuracy for 20 hidden units: 90.5 %

吴恩达深度学习第一课第三周课后作业

Interpretation:
- The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data. #过大模型容易导致过拟合
- The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to fits the data well without also incurring noticable overfitting.
- You will also learn later about regularization, which lets you use very large models (such as n_h = 50) without much overfitting.#后续通过正则化来改善过拟合问题

You’ve learnt to:
- Build a complete neural network with a hidden layer
- Make a good use of a non-linear unit
- Implemented forward propagation and backpropagation, and trained a neural network
- See the impact of varying the hidden layer size, including overfitting.

5 Performance on other datasets

其他数据集上的表现
If you want, you can rerun the whole notebook (minus the dataset part) for each of the following datasets.
显示图形:

# Datasets
noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure = load_extra_datasets()

datasets = {"noisy_circles": noisy_circles,
            "noisy_moons": noisy_moons,
            "blobs": blobs,
            "gaussian_quantiles": gaussian_quantiles}

### START CODE HERE ### (choose your dataset)
dataset = "noisy_circles"
### END CODE HERE ###

X, Y = datasets[dataset]
X, Y = X.T, Y.reshape(1, Y.shape[0])

# make blobs binary
if dataset == "noisy_moons":
    Y = Y%2

# Visualize the data
plt.scatter(X[0, :], X[1, :], c=np.squeeze(Y), s=40, cmap=plt.cm.Spectral);

吴恩达深度学习第一课第三周课后作业

测试:

parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y) plt.title("Decision Boundary for hidden layer size " + str(4))

结果:

Cost after iteration 0: 0.693148
Cost after iteration 1000: 0.399812
Cost after iteration 2000: 0.398242
Cost after iteration 3000: 0.398877
Cost after iteration 4000: 0.399356
Cost after iteration 5000: 0.399656
Cost after iteration 6000: 0.399708
Cost after iteration 7000: 0.399696
Cost after iteration 8000: 0.399675
Cost after iteration 9000: 0.399649
Text(0.5,1,'Decision Boundary for hidden layer size 4')

吴恩达深度学习第一课第三周课后作业

其他的数据集可以分别测试一下,效果还是很好的。
Reference:
- http://scs.ryerson.ca/~aharley/neural-networks/
- http://cs231n.github.io/neural-networks-case-study/