Lecture 9 QuizHelp Center
Question 1
You are experimenting with two different models for a classification task. The figures below show the classification error you get as training progresses on the training data and the validation data for each of the two models. Which model do you think would perform better on previously unseen test data?.
Question 2
The figure below shows the histogram of weights for a learned Neural Network.
Which regularization technique has been used during learning?
Which regularization technique has been used during learning?
Question 3
Suppose you want to regularize the weights of a neural network during training so that lots of its weights are quite close to zero, but a few are a very long way from zero. Which cost function you would add to your objective function?
Question 4
In a linear regression task, a
d dimensional input vector
x is used to predict the output value
y using the weight vector
w where
y=wTx. The error function
E=12(t−wTx)2 where
t is the target output value. We want to use a student-t cost for the weights:
C=λ2∑di=1log(1+w2i).
The total error to be optimized Etot=E+C. What is the expression for ∂Etot∂wi?
The total error to be optimized Etot=E+C. What is the expression for ∂Etot∂wi?
Question 5
Different regularization methods have different effects on the learning process. For example
L2 regularization penalizes high weight values.
L1 regularization penalizes weight values that do not equal zero. Adding noise to the weights during learning ensures that the learned hidden representations take extreme values. Sampling the hidden representations regularizes the network by pushing the hidden representation to be binary during the forward pass which limits the modeling capacity of the network. Given the shown histogram of activations (just before the nonlinear logistic nonlinearity) for a Neural Network, what is the regularization method that has been used (check all that apply)?
Question 6
Suppose we have trained a neural network with one hidden layer and a single logistic output unit to predict whether or not an image contains a bird. If we retrain the network in the same way on the same data but using half as many hidden units, which of the following statements is true: