Lecture 13 QuizHelp Center
Announcement, added on Thursday, November 15 2012, 18:28 UTC. You may find this explanation of SBNs helpful.
This quiz is going to take you through the details of Sigmoid Belief Networks (SBNs). The most relevant videos are the second video ("Belief Nets", especially from 11:44) and third video ("Learning sigmoid belief nets") of lecture 13.
We'll be working with this network:
The network has no biases (or equivalently, the biases are always zero), so it has only two parameters: w1 (the weight on the connection from h1 to v) and w2 (the weight on the connection from h2 to v).
Remember, the units in an SBN are all binary, and the logistic function (also known as the sigmoid function) figures prominently in the definition of SBNs. These binary units, with their logistic/sigmoid probability function, are in a sense the stochastic equivalent of the deterministic logistic hidden units that we've seen often in earlier lectures.
Let's start with w1=−6.90675478 and w2=0.40546511. These numbers were chosen to ensure that the answer to many questions is a very simple answer, which might make it easier to understand more of what's going on. Let's also pick a complete configuration to focus on: h1=0,h2=1,v=1 (we'll call that configuration C011).
This quiz is going to take you through the details of Sigmoid Belief Networks (SBNs). The most relevant videos are the second video ("Belief Nets", especially from 11:44) and third video ("Learning sigmoid belief nets") of lecture 13.
We'll be working with this network:
The network has no biases (or equivalently, the biases are always zero), so it has only two parameters: w1 (the weight on the connection from h1 to v) and w2 (the weight on the connection from h2 to v).
Remember, the units in an SBN are all binary, and the logistic function (also known as the sigmoid function) figures prominently in the definition of SBNs. These binary units, with their logistic/sigmoid probability function, are in a sense the stochastic equivalent of the deterministic logistic hidden units that we've seen often in earlier lectures.
Let's start with w1=−6.90675478 and w2=0.40546511. These numbers were chosen to ensure that the answer to many questions is a very simple answer, which might make it easier to understand more of what's going on. Let's also pick a complete configuration to focus on: h1=0,h2=1,v=1 (we'll call that configuration C011).
Question 1
What is
P(v=1|h1=0,h2=1)? Write your answer with four digits after the decimal point. Hint: the last three of those four digits are zeros. (If you're lost on this question, then I strongly recommend that you do whatever you need to do to figure it out, before proceeding with the rest of this quiz.)
Question 2
What is the probability of that full configuration, i.e.
P(h1=0,h2=1,v=1), which we called
P(C011)? Write your answer with four digits after the decimal point. Hint: it's less than a half, and the last two of those four digits are zeros.
Question 3
Now let's talk about the gradient that we need for learning, i.e.
∂logP(C011)∂wi. There are two ways you can try to answer these questions, and I recommend that you do both and verify that the answer comes out the same way. The first way is to take the derivative yourself. The second one is to use the learning rule that was mentioned in the lecture.
What is
∂logP(C011)∂w1? Write your answer with at least three digits after the decimal point, and don't be too surprised if it's a very simple answer.
Question 4
What is
∂logP(C011)∂w2? Write your answer with at least three digits after the decimal point, and don't be too surprised if it's a very simple answer.
Question 5
As was explained in the lectures, the log likelihood gradient for a full configuration is just one part of the learning. The more difficult part is to get a handle on the posterior probability distribution over full configurations, given the state of the visible units. Explaining away is an important issue there. Let's explore it with new weights: for the remainder of this quiz,
w1=10, and
w2=−4.
What is
P(h2=1|v=1,h1=0)? Give your answer with at least four digits after the decimal point. Hint: it's a fairly small number (and not a round number like for the earlier questions); try to intuitively understand why it's small. Second hint: you might find Bayes' rule useful, but even with that rule, this still requires some thought.
Question 6
What is
P(h2=1|v=1,h1=1)? Give your answer with at least four digits after the decimal point. Hint: it's quite different from the answer to the previous question; try to understand why. The fact that those two are different shows that, conditional on the state of the visible units, the hidden units have a strong effect on each other, i.e. they're not independent. That is what we call explaining away, and the earthquake vs. truck network is another example of that.