【转载请注明出处】http://www.cnblogs.com/mashiqi
2015/3/13
对于隐变量只有有限个取值(比如$N$个)的情况,我们可以将隐变量表示为${z_j} = [{z_{j1}},{z_{j2}}, \cdots ,{z_{jN}}]$,其中${z_{jk}} \in \{ 0,1\} $且${z_{j1}} + {z_{j2}} + \cdots + {z_{jN}} = 1$。这样表示的目的主要是为了使后面的计算方便。如果:
$$\left\{ \matrix{
p({z_{jk}} = 1) = {\pi _k}\cr
p({p_j}|{z_{jk}} = 1;\theta ) = {f_k}({p_j};\theta ) \cr} \right.$$
则我们可以把$p({p_j},{z_j};\theta )$表示为:
$$p({p_j},{z_j};\theta ) = \mathop \prod \limits_{k = 1}^N {[{\pi _k}{f_k}({p_j};\theta )]^{{z_{jk}}}}$$
下面,我们看看怎么得到complete-data log-likelihood:
$$\eqalign{
L(\theta ) &= \mathop \sum \limits_{j = 1}^M \ln p({p_j};\theta ) = \mathop \sum \limits_{j = 1}^M \ln [\mathop \sum \limits_k^{} p({p_j},{z_{jk}} = 1;\theta )] \cr
&= \mathop \sum \limits_{j = 1}^M \ln [\mathop \sum \limits_k^{} p({p_j},{z_{jk}} = 1;\theta ){{p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})} \over {p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})}}] \cr
&= \mathop \sum \limits_{j = 1}^M \ln [\mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}}){{p({p_j},{z_{jk}} = 1;\theta )} \over {p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})}}] \cr
&\ge \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})}}]{\kern 1pt} {\kern 1pt} (Jensen's) \cr
&= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})}}p({p_j};{\theta ^{(n)}})] \cr
&= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})}}] \cr
&+ \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [p({p_j};{\theta ^{(n)}})] \cr
&= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})}}] + \mathop \sum \limits_{j = 1}^M \ln p({p_j};{\theta ^{(n)}}) \cr
&= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})}}] + L({\theta ^{(n)}}) \cr} $$
因此,记$l(\theta ) = \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})}}]$,我们可以得到:
$$\left\{ \matrix{
l({\theta ^{(n)}}) = 0 \cr
L(\theta ) \ge l(\theta ) + L({\theta ^{(n)}}) \cr} \right.$$
如果我们能求得$l(\theta )$的极大值点$\theta^{*}$,则一定有
$$L({\theta ^*}) \ge L({\theta ^{(n)}})$$
我们就可以把$\theta^{*}$当作$\theta^{(n+1)}$。
由于
$$\eqalign{
l(\theta ) &= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{{p({p_j},{z_{jk}} = 1;\theta )} \over {p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})}}] \cr
&= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln p({p_j},{z_{jk}} = 1;\theta ) + const \cr
&= {\cal Q}(\theta ,{\theta ^{(n)}}) + const \cr
{\cal Q}(\theta ,{\theta ^{(n)}}) &= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln p({p_j},{z_{jk}} = 1;\theta ) \cr} $$
因此,通常情况下我们优化$l(\theta )$的前面这一项${\cal Q}(\theta ,{\theta ^{(n)}})$就行了,许多介绍EM算法的资料也就是直接优化${\cal Q}(\theta ,{\theta ^{(n)}})$这一项。在这一项里面:
$$\eqalign{
p({p_j},{z_{jk}} = 1;\theta ) &= p({z_{jk}} = 1;\theta )p({p_j}|{z_{jk}} = 1;\theta ) \cr
&= {\pi _k}{f_k}({p_j};\theta ) \cr} $$
带入式可得:
$$\eqalign{
{\cal Q}(\theta ,{\theta ^{(n)}}) &= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})\ln [{\pi _k}{f_k}({p_j};\theta )] \cr
&= \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} p({z_{jk}} = 1|{p_j};{\theta ^{(n)}})[\ln {\pi _k} + \ln {f_k}({p_j};\theta )] \cr} $$
为此我们需要计算这个后验概率:
$$\eqalign{
p({z_{jk}} = 1|{p_j};{\theta ^{(n)}}) &= {{p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})} \over {p({p_j};{\theta ^{(n)}})}} = {{p({p_j},{z_{jk}} = 1;{\theta ^{(n)}})} \over {\mathop \sum \limits_K^{} p({p_j},{z_{jK}} = 1;{\theta ^{(n)}})}} \cr
&= {{p({z_{jk}} = 1;{\theta ^{(n)}})p({p_j}|{z_{jk}} = 1;{\theta ^{(n)}})} \over {\mathop \sum \limits_K^{} p({z_{jK}} = 1;{\theta ^{(n)}})p({p_j}|{z_{jK}} = 1;{\theta ^{(n)}})}} \cr
&= {{\pi _K^{(n)}{f_k}({p_j};{\theta ^{(n)}})} \over {\mathop \sum \limits_K^{} \pi _K^{(n)}{f_K}({p_j};{\theta ^{(n)}})}} \cr} $$
因此,
$${\cal Q}(\theta ,{\theta ^{(n)}}) = \mathop \sum \limits_{j = 1}^M \mathop \sum \limits_k^{} {{\pi _K^{(n)}{f_k}({p_j};{\theta ^{(n)}})} \over {\mathop \sum \limits_K^{} \pi _K^{(n)}{f_K}({p_j};{\theta ^{(n)}})}}[\ln {\pi _k} + \ln {f_k}({p_j};\theta )]$$
我们求最优化问题:
$$[{\pi ^{(n + 1)}},{\theta ^{(n + 1)}}] = \mathop {\arg \max }\limits_{\pi ,\theta } {\cal Q}(\theta ,{\theta ^{(n)}})$$
就可以得到新一轮的迭代结果。