Spectral Clustering（学习Free Mind知识整理）

阅读http://blog.pluskid.org/?p=287文章中的一些知识整理：学习的知识点谱聚类。也可以学习一下http://blog.csdn.net/v_july_v/article/details/40738211里面的介绍和链接中的参考链接，总结的非常好。 ================================================================== Spectral Clustering 算法：

根据数据构造一个Graph，Graph的每一个节点对应一个数据点，将各个点连接起来（随后将那些已经被连接起来但并不怎么相似的点，通过cut/RatioCut/NCut 的方式剪开），并且边的权重用于表示数据之间的相似度。把这个Graph用邻接矩阵的形式表示出来，记为。
把的每一列元素加起来得到个数，把它们放在对角线上（其他地方都是零），组成一个的对角矩阵，记为度矩阵，并把 - 的结果记为拉普拉斯矩阵。
求出的前个特征值（前个指按照特征值的大小从小到大排序得到），以及对应的特征向量。
把这个特征（列）向量排列在一起组成一个的矩阵，将其中每一行看作维空间中的一个向量，并使用 K-means 算法进行聚类。聚类的结果中每一行所属的类别就是原来 Graph 中的节点亦即最初的个数据点分别所属的类别。

谱聚类的基本思想便是利用样本数据之间的相似矩阵（拉普拉斯矩阵）进行特征分解（通过Laplacian Eigenmap 的降维方式降维），然后将得到的特征向量进行 K-means聚类。

===================================================================

优点：(1)谱聚类能在任意形状的样本空间上进行聚类，且收敛于全局最优点。而像k-means算法和EM算法是建立在凸球形的样本空间上，当样本空间不凸时，算法会陷入“局部”最优。

(2) 谱聚类只需要数据之间的相似度矩阵就可以了，而不必像K-means那样要求数据必须是 N 维欧氏空间中的向量。

(3)RatioCut方法只考虑了类间相似性最小，而normalizedCut不仅考虑了类间还考虑了类内的相似性。

===================================================================

unnormalized谱聚类的MATLAB实现：

function [ C, L, D, Q, V ] = SpectralClustering(W, k)
% spectral clustering algorithm
% input: adjacency matrix W; number of cluster k 
% return: cluster indicator vectors as columns in C; unnormalized Laplacian L; degree matrix D;
%         eigenvectors matrix Q; eigenvalues matrix V

% calculate degree matrix
degs = sum(W, 2);
D = sparse(1:size(W, 1), 1:size(W, 2), degs);

% compute unnormalized Laplacian
L = D - W;

% compute the eigenvectors corresponding to the k smallest eigenvalues
% diagonal matrix V is NcutL's k smallest magnitude eigenvalues 
% matrix Q whose columns are the corresponding eigenvectors.
[Q, V] = eigs(L, k, 'SA');

% use the k-means algorithm to cluster V row-wise
% C will be a n-by-1 matrix containing the cluster number for each data point
C = kmeans(Q, k);

% convert C to a n-by-k matrix containing the k indicator vectors as columns
C = sparse(1:size(D, 1), C, 1);

end

===================================================================

不错的链接：

孟岩之理解矩阵系列：http://blog.csdn.net/myan/article/details/1865397；
理解矩阵的12点数学笔记：http://www.51weixue.com/thread-476-1-1.html；
一堆wikipedia，比如特征向量：https://zh.wikipedia.org/wiki/%E7%89%B9%E5%BE%81%E5%90%91%E9%87%8F；
wikipedia上关于拉普拉斯矩阵的介绍：http://en.wikipedia.org/wiki/Laplacian_matrix；
邹博之聚类PPT：http://pan.baidu.com/s/1i3gOYJr；
关于谱聚类的一篇非常不错的英文文献，“A Tutorial on Spectral Clustering”：http://engr.case.edu/ray_soumya/mlrg/Luxburg07_tutorial_spectral_clustering.pdf；
知乎上关于矩阵和特征值的两个讨论：http://www.zhihu.com/question/21082351，http://www.zhihu.com/question/21874816；
谱聚类：http://www.cnblogs.com/fengyan/archive/2012/06/21/2553999.html；
谱聚类算法：http://www.cnblogs.com/sparkwen/p/3155850.html；
漫谈 Clustering 系列：http://blog.pluskid.org/?page_id=78；
《Mining of Massive Datasets》第10章：http://infolab.stanford.edu/~ullman/mmds/book.pdf；
Tydsh: Spectral Clustering：①http://blog.sina.com.cn/s/blog_53a8a4710100g2rt.html，②http://blog.sina.com.cn/s/blog_53a8a4710100g2rv.html，③http://blog.sina.com.cn/s/blog_53a8a4710100g2ry.html，④http://blog.sina.com.cn/s/blog_53a8a4710100g2rz.html；
H. Zha, C. Ding, M. Gu, X. He, and H.D. Simon. Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS 2001). pp. 1057-1064, Vancouver, Canada. Dec. 2001；
机器学习中谱聚类方法的研究：http://lamda.nju.edu.cn/conf/MLA07/files/YuJ.pdf；
谱聚类的算法实现：http://liuzhiqiangruc.iteye.com/blog/2117144。

====================================================================

秒客网

Spectral Clustering（学习Free Mind知识整理）

相关文章