1. Building your Deep Neural Network: Step by Step
1.1 作业大纲
为了搭建你的神经网络,你将实现几个”helper functions”.这几个函数将在下次作业被用于搭建两层和
- 为两层和
L 层神经网络初始化参数 - 实现前向传播模块(下图的紫色区域)
- 完成一层的前向传播LINEAR部分(结果为
Z[l] ). - 我们给出ACTIVATION 函数(relu/sigmoid).
- 将前两步结合为[LINEAR->ACTIVATION]前向函数.
- 执行[LINEAR->RELU]前向函数
L−1 次(从1到L-1层)最后加一次[LINEAR->SIGMOID](最后一层L ).你将得到新的L_model_forward函数
- 完成一层的前向传播LINEAR部分(结果为
- 计算loss
- 实现反向传播模块(下图的红色区域)
- 完成一层的反向向传播LINEAR部分
- 我们给出ACTIVATION 函数的梯度(relu_backward/sigmoid_backward)
- 将前两步结合为[LINEAR->ACTIVATION]反向函数
- 执行[LINEAR->RELU]反向函数
L−1 次,加一次[LINEAR->SIGMOID].你将得到新的L_model_backward函数
- 最后更新参数
1.2 初始化
你将写两个helper functions用于初始化你的参数模型.第一个函数被用于初始化一个两层模型的参数.第二个用于初始化
1.2.1 2-layer Neural Network
Instructions:
- The model’s structure is: LINEAR -> RELU -> LINEAR -> SIGMOID.
- Use random initialization for the weight matrices. Use
np.random.randn(shape)*0.01
with the correct shape.(使用标准正态分布随机初始化权重矩阵并最后*0.01
) - Use zero initialization for the biases. Use
np.zeros(shape)
.(使用0初始化bias)
def initialize_parameters(n_x, n_h, n_y):
""" Argument: n_x -- size of the input layer (输入层的size) n_h -- size of the hidden layer (隐藏层的size) n_y -- size of the output layer (输出层的size) Returns: parameters -- python dictionary containing your parameters:(返回一个python字典包含你的参数) W1 -- weight matrix of shape (n_h, n_x) b1 -- bias vector of shape (n_h, 1) W2 -- weight matrix of shape (n_y, n_h) b2 -- bias vector of shape (n_y, 1) """
np.random.seed(1)
### START CODE HERE ### (≈ 4 lines of code)
W1 = np.random.randn(n_h, n_x) * 0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y, n_h) * 0.01
b2 = np.zeros((n_y, 1))
### END CODE HERE ###
assert(W1.shape == (n_h, n_x)) # (断言语句判断shape是否正确,如果不正确将抛出异常)
assert(b1.shape == (n_h, 1))
assert(W2.shape == (n_y, n_h))
assert(b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
1.2.2 L-layer Neural Network
Instructions:
- The model’s structure is [LINEAR -> RELU]
× (L-1) -> LINEAR -> SIGMOID. I.e., it hasL−1 layers using a ReLU activation function followed by an output layer with a sigmoid activation function.(有L−1 层使用ReLU激活函数,输出层用sigmoid激活函数) - Use random initialization for the weight matrices. Use
np.random.rand(shape) * 0.01
. - Use zeros initialization for the biases. Use
np.zeros(shape)
. - We will store
n[l] , the number of units in different layers, in a variablelayer_dims
. For example, thelayer_dims
for the “Planar Data classification model” from last week would have been [2,4,1]: There were two inputs, one hidden layer with 4 hidden units, and an output layer with 1 output unit. Thus meansW1
’s shape was (4,2),b1
was (4,1),W2
was (1,4) andb2
was (1,1). Now you will generalize this toL layers! (我们每个层的单元数放入到变量layer_dims
中,这是一个python列表) - Here is the implementation for
L=1 (one layer neural network). It should inspire you to implement the general case (L-layer neural network).
if L == 1:
parameters["W" + str(L)] = np.random.randn(layer_dims[1], layer_dims[0]) * 0.01
parameters["b" + str(L)] = np.zeros((layer_dims[1], 1))
def initialize_parameters_deep(layer_dims):
""" Arguments: layer_dims -- python array (list) containing the dimensions of each layer in our network 一个python数组(列表)包含网络中每个层的维度 Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1]) bl -- bias vector of shape (layer_dims[l], 1) """
np.random.seed(3)
parameters = {}
L = len(layer_dims) # number of layers in the network
for l in range(1, L): # 从1到L-1
### START CODE HERE ### (≈ 2 lines of code)
parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
### END CODE HERE ###
assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
return parameters
1.3 前向传播模块
1.3.1 Linear Forward
Now that you have initialized your parameters, you will do the forward propagation module. You will start by implementing some basic functions that you will use later when implementing the model. You will complete three functions in this order:
- LINEAR
- LINEAR -> ACTIVATION where ACTIVATION will be either ReLU or Sigmoid.
- [LINEAR -> RELU]
× (L-1) -> LINEAR -> SIGMOID (whole model)
The linear forward module (vectorized over all the examples) computes the following equations:(线性前向模块的向量化实现等式如下:)
where
def linear_forward(A, W, b):
""" Implement the linear part of a layer's forward propagation. Arguments: A -- activations from previous layer (or input data): (size of previous layer, number of examples) 来至前一层的激活值(或者输入数据) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) Returns: Z -- the input of the activation function, also called pre-activation parameter 激活函数的输入,也叫预激活参数 cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently 缓存 -- 一个python字典 """
### START CODE HERE ### (≈ 1 line of code)
Z = np.dot(W, A) + b
### END CODE HERE ###
assert(Z.shape == (W.shape[0], A.shape[1]))
cache = (A, W, b)
return Z, cache
1.3.2 Linear-Activation Forward
In this notebook, you will use two activation functions:
-
Sigmoid:
σ(Z)=σ(WA+b)=11+e−(WA+b) . We have provided you with thesigmoid
function. This function returns two items: the activation value “A
” and a “cache
” that contains “Z
” (it’s what we will feed in to the corresponding backward function). To use it you could just call:
此处sigmoid函数将返回两项:一个是激活值”A
“另一个是”cache
“内容是”Z
“(它将在相应反向函数中使用)
A, activation_cache = sigmoid(Z)
-
ReLU: The mathematical formula for ReLu is
A=RELU(Z)=max(0,Z) . We have provided you with therelu
function. This function returns two items: the activation value “A
” and a “cache
” that contains “Z
” (it’s what we will feed in to the corresponding backward function). To use it you could just call:
此处ReLU函数将返回两项:一个是激活值”A
“另一个是”cache
“内容是”Z
“(它将在相应反向函数中使用)
A, activation_cache = relu(Z)
def linear_activation_forward(A_prev, W, b, activation):
""" Implement the forward propagation for the LINEAR->ACTIVATION layer Arguments: A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" 该层使用的激活函数,被存为字符串"sigmoid" or "relu" Returns: A -- the output of the activation function, also called the post-activation value cache -- a python dictionary containing "linear_cache" and "activation_cache"; stored for computing the backward pass efficiently """
if activation == "sigmoid":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
### START CODE HERE ### (≈ 2 lines of code)
Z, linear_cache = linear_forward(A_prev, W, b) # 此处linear_cache(A, W, b)
A, activation_cache = sigmoid(Z) # 此处activation_cache: Z
### END CODE HERE ###
elif activation == "relu":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
### START CODE HERE ### (≈ 2 lines of code)
Z, linear_cache = linear_forward(A_prev, W, b)
A, activation_cache = relu(Z)
### END CODE HERE ###
assert (A.shape == (W.shape[0], A_prev.shape[1]))
cache = (linear_cache, activation_cache) # 此处cache为一个元组
return A, cache
1.3.3 L-Layer Model
Instruction: In the code below, the variable AL
will denote
Yhat
, i.e., this is
Tips:
- Use the functions you had previously written
- Use a for loop to replicate [LINEAR->RELU] (L-1) times
- Don’t forget to keep track of the caches in the “caches” list. To add a new value
c
to alist
, you can uselist.append(c)
.(不要忘记将cache放入”caches”列表中)
def L_model_forward(X, parameters):
""" Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation Arguments: X -- data, numpy array of shape (input size, number of examples) 数据,Numpy 数组,shape为(输入size,样本数) parameters -- output of initialize_parameters_deep() 来至初始化参数函数的输出 Returns: AL -- last post-activation value caches -- list of caches containing: every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2) the cache of linear_sigmoid_forward() (there is one, indexed L-1) """
caches = []
A = X
L = len(parameters) // 2 # number of layers in the neural network 此处为整除2,因为parameters中有W、b所以除2
# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
A_prev = A
### START CODE HERE ### (≈ 2 lines of code)
A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation='relu')
caches.append(cache)
### END CODE HERE ###
# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
### START CODE HERE ### (≈ 2 lines of code)
AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation='sigmoid')
caches.append(cache)
### END CODE HERE ###
assert(AL.shape == (1,X.shape[1]))
return AL, caches
1.4 代价函数
Now you will implement forward and backward propagation. You need to compute the cost, because you want to check if your model is actually learning.(现在实现了前向传播. 需要去计算cost,因为你需要检查模型实际学习的怎么样)
Exercise: Compute the cross-entropy cost
def compute_cost(AL, Y):
""" Implement the cost function defined by equation (7). Arguments: AL -- probability vector corresponding to your label predictions, shape (1, number of examples) Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples) Returns: cost -- cross-entropy cost """
m = Y.shape[1]
# Compute loss from aL and y.
### START CODE HERE ### (≈ 1 lines of code)
cost = (-1 / m) * np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1 - Y, np.log(1 - AL)))
### END CODE HERE ###
cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())
return cost
import numpy as np
Y, AL = compute_cost_test_case()
AL = np.array([[0.01, 0.01, 0.03]])
print(AL)
print(Y)
print("cost = " + str(compute_cost(AL, Y)))
------------
[[ 0.01 0.01 0.03]]
[[1 1 1]]
cost = 4.23896608977
通过改写AL
可以发现,只要AL
与Y
差得够远这个cost可以很大!
1.5 反向传播模块
1.5.1 Linear backward
def linear_backward(dZ, cache):
""" Implement the linear portion of backward propagation for a single layer (layer l) Arguments: dZ -- Gradient of the cost with respect to the linear output (of current layer l) cost关于线性输出Z的梯度(当前层为l) cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """
A_prev, W, b = cache
m = A_prev.shape[1] # 样本数m
### START CODE HERE ### (≈ 3 lines of code)
dW = np.dot(dZ, cache[0].T) / m
db = np.sum(dZ, axis=1, keepdims=True) / m
dA_prev = np.dot(cache[1].T, dZ)
### END CODE HERE ###
assert (dA_prev.shape == A_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)
return dA_prev, dW, db
1.5.2 Linear-Activation backward
Next, you will create a function that merges the two helper functions: linear_backward
and the backward step for the activation linear_activation_backward
.
To help you implement linear_activation_backward
, we provided two backward functions:
-
sigmoid_backward
: Implements the backward propagation for SIGMOID unit. You can call it as follows:
dZ = sigmoid_backward(dA, activation_cache)
-
relu_backward
: Implements the backward propagation for RELU unit. You can call it as follows:
dZ = relu_backward(dA, activation_cache)
If
sigmoid_backward
and relu_backward
compute
def linear_activation_backward(dA, cache, activation):
""" Implement the backward propagation for the LINEAR->ACTIVATION layer. Arguments: dA -- post-activation gradient for current layer l 当前层已计算出的激活值梯度 cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """
linear_cache, activation_cache = cache
if activation == "relu":
### START CODE HERE ### (≈ 2 lines of code)
dZ = relu_backward(dA, activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
### END CODE HERE ###
elif activation == "sigmoid":
### START CODE HERE ### (≈ 2 lines of code)
dZ = sigmoid_backward(dA, activation_cache)
dA_prev, dW, db = linear_backward(dZ, linear_cache)
### END CODE HERE ###
return dA_prev, dW, db
1.5.3 L-Model Backward
Now you will implement the backward function for the whole network. Recall that when you implemented the L_model_forward
function, at each iteration, you stored a cache which contains (X,W,b, and z). In the back propagation module, you will use those variables to compute the gradients. Therefore, in the L_model_backward
function, you will iterate through all the hidden layers backward, starting from layer
L_model_forward
函数时,每迭代一层都保留cache包含(X,W,b, and z). 在反向传播模块将使用这些变量去计算梯度.因此在L_model_backward
函数将从
Initializing backpropagation:
To backpropagate through this network, we know that the output is,
dAL
To do so, use this formula (derived using calculus which you don’t need in-depth knowledge of):(初始化反向传播,我们需要编码实现代价函数关于dAL
的导数)
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL
def L_model_backward(AL, Y, caches):
""" Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group Arguments: AL -- probability vector, output of the forward propagation (L_model_forward()) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) caches -- list of caches containing:缓存列表包含: every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2) relu激活函数cache的下标是从caches列表的0...L-2 the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1]) sigmoid激活函数下标是L-1 Returns: grads -- A dictionary with the gradients 返回每层W,b的梯度字典 grads["dA" + str(l)] = ... grads["dW" + str(l)] = ... grads["db" + str(l)] = ... """
grads = {}
L = len(caches) # the number of layers 我们说神经网络多少层时不包括输入层
m = AL.shape[1]
Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
# Initializing the backpropagation 初始化反向传播即求得AL关于cost的梯度
### START CODE HERE ### (1 line of code)
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
### END CODE HERE ###
# Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
### START CODE HERE ### (approx. 2 lines)
current_cache = caches[-1] # 取得caches倒数第一个值
grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid")
### END CODE HERE ###
for l in reversed(range(L-1)): # 从L-2到0
# lth layer: (RELU -> LINEAR) gradients.
# Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]
### START CODE HERE ### (approx. 5 lines)
current_cache = caches[l] # caches的下标是从0到L-1,所以对于倒数第二层下标为L-2
dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l+2)], current_cache, activation = "relu")
grads["dA" + str(l + 1)] = dA_prev_temp
grads["dW" + str(l + 1)] = dW_temp
grads["db" + str(l + 1)] = db_temp
### END CODE HERE ###
return grads
1.5.4 Update Parameters
def update_parameters(parameters, grads, learning_rate):
""" Update parameters using gradient descent使用梯度下降更新参数 Arguments: parameters -- python dictionary containing your parameters grads -- python dictionary containing your gradients, output of L_model_backward Returns: parameters -- python dictionary containing your updated parameters parameters["W" + str(l)] = ... parameters["b" + str(l)] = ... """
L = len(parameters) // 2 # number of layers in the neural network
# Update rule for each parameter. Use a for loop.
### START CODE HERE ### (≈ 3 lines of code) 参数与梯度下标均是从1到L
for l in range(L):
parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]
parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]
### END CODE HERE ###
return parameters
1.6 总结
Congrats on implementing all the functions required for building a deep neural network! (恭喜你完成了所有搭建深度神经网络所需的函数)
We know it was a long assignment but going forward it will only get better. The next part of the assignment is easier. (我们知道这个很长的作业,但只有完成它才能更好的继续向前. 下部分的作业将很容易)
In the next assignment you will put all these together to build two models:
- A two-layer neural network
- An L-layer neural network
You will in fact use these models to classify cat vs non-cat images!(你将使用这些模型去分辨猫的图片!)
2. Deep Neural Network for Image Classification: Application
2.1 数据集
You will use the same “Cat vs non-Cat” dataset as in “Logistic Regression as a Neural Network” (Assignment 2). The model you had built had 70% test accuracy on classifying cats vs non-cats images. Hopefully, your new model will perform a better!(你将使用与逻辑回归相同的数据集,在逻辑回归上有70%的测试正确率. 希望你新的模型表现更好!)
Problem Statement: You are given a dataset (“data.h5”) containing:
- a training set of m_train images labelled as cat (1) or non-cat (0)
- a test set of m_test images labelled as cat and non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB).
Number of training examples: 209 训练集样本数209
Number of testing examples: 50 测试集样本数50
Each image is of size: (64, 64, 3) 每张图片的size
train_x_orig shape: (209, 64, 64, 3) 原始x训练集shape
train_y shape: (1, 209) 原始y训练集shape
test_x_orig shape: (50, 64, 64, 3)
test_y shape: (1, 50)
处理数据集之后的shape
train_x's shape: (12288, 209)
test_x's shape: (12288, 50)
2.2 构建网络
Now that you are familiar with the dataset, it is time to build a deep neural network to distinguish cat images from non-cat images.
You will build two different models:(你将搭建两个不同的模型)
- A 2-layer neural network(两层神经网络)
- An L-layer deep neural network(L层神经网络)
You will then compare the performance of these models, and also try out different values for
2.2.1 2-layer neural network
2.2.2 L-layer deep neural network
2.2.3 General methodology
As usual you will follow the Deep Learning methodology to build the model:
1. Initialize parameters / Define hyperparameters(初始化参数/确定超参数)
2. Loop for num_iterations:(执行num次循环)
a. Forward propagation
b. Compute cost function
c. Backward propagation
d. Update parameters (using parameters, and grads from backprop)
3. Use trained parameters to predict labels(使用训练好的参数去预测labels)
2.3 两层神经网络
Question: Use the helper functions you have implemented in the previous assignment to build a 2-layer neural network with the following structure: LINEAR -> RELU -> LINEAR -> SIGMOID. The functions you may need and their inputs are:
def initialize_parameters(n_x, n_h, n_y):
...这个专为两层网络初始化参数
return parameters
def linear_activation_forward(A_prev, W, b, activation):
...此处并未用L_model_forward,因为只有两层直接调用两次即可
return A, cache
def compute_cost(AL, Y):
...
return cost
def linear_activation_backward(dA, cache, activation):
...与前向相同,调用两次
return dA_prev, dW, db
def update_parameters(parameters, grads, learning_rate):
...
return parameters
def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):
""" Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID. Arguments: X -- input data, of shape (n_x, number of examples) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- dimensions of the layers (n_x, n_h, n_y) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- If set to True, this will print the cost every 100 iterations Returns: parameters -- a dictionary containing W1, W2, b1, and b2 """
np.random.seed(1)
grads = {}
costs = [] # to keep track of the cost
m = X.shape[1] # number of examples
(n_x, n_h, n_y) = layers_dims
# Initialize parameters dictionary, by calling one of the functions you'd previously implemented
### START CODE HERE ### (≈ 1 line of code)
parameters = initialize_parameters(n_x, n_h, n_y)
### END CODE HERE ###
# Get W1, b1, W2 and b2 from the dictionary parameters.
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".
### START CODE HERE ### (≈ 2 lines of code)
A1, cache1 = linear_activation_forward(X, W1, b1, 'relu')
A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid')
### END CODE HERE ###
# Compute cost
### START CODE HERE ### (≈ 1 line of code)
cost = compute_cost(A2, Y)
### END CODE HERE ###
# Initializing backward propagation
dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
# Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
### START CODE HERE ### (≈ 2 lines of code)
dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid')
dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu')
### END CODE HERE ###
# Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
grads['dW1'] = dW1
grads['db1'] = db1
grads['dW2'] = dW2
grads['db2'] = db2
# Update parameters.
### START CODE HERE ### (approx. 1 line of code)
parameters = update_parameters(parameters, grads, learning_rate)
### END CODE HERE ###
# Retrieve W1, b1, W2, b2 from parameters
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Print the cost every 100 training example
if print_cost and i % 100 == 0:
print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
if print_cost and i % 100 == 0:
costs.append(cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
return parameters
Note: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called “early stopping” and we will talk about it in the next course. Early stopping is a way to prevent overfitting. (你可能注意到如果在模型上运行更少的迭代数(比如1500次)可能在测试集上会得到一个更好的结果. 这叫”early stopping”,我们将在下门课讨论. Early stopping是一种防止过拟合的方法)
Congratulations! It seems that your 2-layer neural network has better performance (72%) than the logistic regression implementation (70%, assignment week 2). Let’s see if you can do even better with an
2.4 L层神经网络
Question: Use the helper functions you have implemented previously to build an
def initialize_parameters_deep(layer_dims):
...此处使用深层的初始化参数函数而不是两层的网络
return parameters
def L_model_forward(X, parameters):
...
return AL, caches
def compute_cost(AL, Y):
...
return cost
def L_model_backward(AL, Y, caches):
...
return grads
def update_parameters(parameters, grads, learning_rate):
...
return parameters
### CONSTANTS ###
layers_dims = [12288, 20, 7, 5, 1] # 5-layer model
def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009
""" Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. Arguments: X -- data, numpy array of shape (number of examples, num_px * num_px * 3) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). learning_rate -- learning rate of the gradient descent update rule num_iterations -- number of iterations of the optimization loop print_cost -- if True, it prints the cost every 100 steps Returns: parameters -- parameters learnt by the model. They can then be used to predict. """
np.random.seed(1)
costs = [] # keep track of cost
# Parameters initialization.
### START CODE HERE ###
parameters = initialize_parameters_deep(layers_dims)
### END CODE HERE ###
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.
### START CODE HERE ### (≈ 1 line of code)
AL, caches = L_model_forward(X, parameters)
### END CODE HERE ###
# Compute cost.
### START CODE HERE ### (≈ 1 line of code)
cost = compute_cost(AL, Y)
### END CODE HERE ###
# Backward propagation.
### START CODE HERE ### (≈ 1 line of code)
grads = L_model_backward(AL, Y, caches)
### END CODE HERE ###
# Update parameters.
### START CODE HERE ### (≈ 1 line of code)
parameters = update_parameters(parameters, grads, learning_rate)
### END CODE HERE ###
# Print the cost every 100 training example
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
if print_cost and i % 100 == 0:
costs.append(cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
return parameters
Congrats! It seems that your 5-layer neural network has better performance (80%) than your 2-layer neural network (72%) on the same test set. (恭喜!看起来在相同的测试集上5层神经网络比2层表现更好)
This is good performance for this task. Nice job!
Though in the next course on “Improving deep neural networks” you will learn how to obtain even higher accuracy by systematically searching for better hyperparameters (learning_rate, layers_dims, num_iterations, and others you’ll also learn in the next course). (通过下门课你将获得更高的正确率)
2.5 结果分析
First, let’s take a look at some images the L-layer model labeled incorrectly. This will show a few mislabeled images. (首先让我们看看预测错误的图片)
A few type of images the model tends to do poorly on include:
- Cat body in an unusual position(猫的身体在不寻常的位置)
- Cat appears against a background of a similar color(猫看起来背景颜色相似)
- Unusual cat color and species(猫的颜色和品种不寻常)
- Camera Angle(照相机的角度)
- Brightness of the picture(图片的亮度)
- Scale variation (cat is very large or small in image) (尺度变化,猫过大或过小)