Cluster analysis

时间:2021-04-10 23:58:13

https://en.wikipedia.org/wiki/Cluster_analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.

Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς "grape") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest.

Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939[1][2] and famously used by Cattell beginning in 1943[3] for trait theory classification in personality psychology.

Cluster analysis的更多相关文章

  1. cluster analysis in data mining

    https://en.wikipedia.org/wiki/K-means_clustering k-means clustering is a method of vector quantizati ...

  2. UVALive 6906 Cluster Analysis 并查集

    Cluster Analysis 题目连接: https://icpcarchive.ecs.baylor.edu/index.php?option=com_onlinejudge&Itemi ...

  3. UVALive 6906 A - Cluster Analysis

    思路:排个序,依次选就好了. #include <bits/stdc++.h> #define PB push_back #define MP make_pair using namesp ...

  4. K-Means 聚类算法

    K-Means 概念定义: K-Means 是一种基于距离的排他的聚类划分方法. 上面的 K-Means 描述中包含了几个概念: 聚类(Clustering):K-Means 是一种聚类分析(Clus ...

  5. 地理信息系统 - ArcGIS - 高&sol;低聚类分析工具&lpar;High&sol;Low Clustering ---Getis-Ord General G&rpar;

    前段时间在学习空间统计相关的知识,于是把ArcGIS里Spatial Statistics工具箱里的工具好好研究了一遍,同时也整理了一些笔记上传分享.这一篇先聊一些基础概念,工具介绍篇随后上传. 空间 ...

  6. K-means聚类算法

    聚类分析(英语:Cluster analysis,亦称为群集分析) K-means也是聚类算法中最简单的一种了,但是里面包含的思想却是不一般.最早我使用并实现这个算法是在学习韩爷爷那本数据挖掘的书中, ...

  7. Bioinformatics Glossary

    原文:http://homepages.ulb.ac.be/~dgonze/TEACHING/bioinfo_glossary.html Affine gap costs: A scoring sys ...

  8. &lpar;转&rpar; Deep Learning Research Review Week 2&colon; Reinforcement Learning

      Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...

  9. Spark入门实战系列--8&period;Spark MLlib(下)--机器学习库SparkMLlib实战

    [注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 .MLlib实例 1.1 聚类实例 1.1.1 算法说明 聚类(Cluster analys ...

随机推荐

  1. 7&period;openstack之mitaka搭建dashboard

    部署控制面板dashboard 控制节点 1.安装软件包 yum install openstack-dashboard -y 2.配置 vim /etc/openstack-dashboard/lo ...

  2. HashTree(哈希树) ——和trie类似,只是将字符换成了质数,sphinx用到了???

    摘自:http://blog.csdn.net/yang_yulei/article/details/46337405 哈希树的理论基础 [质数分辨定理] 简单地说就是:n个不同的质数可以" ...

  3. sdut 2162&colon;The Android University ACM Team Selection Contest(第二届山东省省赛原题,模拟题)

    The Android University ACM Team Selection Contest Time Limit: 1000ms   Memory limit: 65536K  有疑问?点这里 ...

  4. &lbrack;转&rsqb; java编程规范

    原文链接: 资料推荐--Google Java编码规范 之前已经推荐过Google的Java编码规范英文版了: http://google-styleguide.googlecode.com/svn/ ...

  5. Python Semaphore

    Semaphore信号量的使用 信号量: 互斥锁 同时只允许一个线程更改数据,而Semaphore是同时允许一定数量的线程更改数据 ,比如厕所有3个坑,那最多只允许3个人上厕所,后面的人只能等里面有人 ...

  6. 【驱动】MTD子系统分析

    MTD介绍 MTD,Memory Technology Device即内存技术设备 字符设备和块设备的区别在于前者只能被顺序读写,后者可以随机访问:同时,两者读写数据的基本单元不同. 字符设备,以字节 ...

  7. &lbrack;django&rsqb;django的orm查询

    实体 实体 出版社 category 作者 tag 书 文章 先学习一下基础的增删查改 django orm增删改查: https://www.cnblogs.com/iiiiiher/article ...

  8. jQuery插件——可以动态改动颜色的jQueryColor

    1.请选择代码框中所有代码后, 保存为 jquery.color.js /*! * jQuery Color Animations v@VERSION * https://github.com/jqu ...

  9. SecureCRT 用ssh key登录配置方法

    服务器端配置 OS: Debian-6.0.5 复制代码 代码如下: #apt-get install ssh 安装ssh服务 编辑/etc/ssh/sshd_config配置文件 复制代码 代码如下 ...

  10. Android批量图片加载经典系列——Volley框架实现多布局的新闻列表

    一.问题描述 Volley是Google 2013年发布的实现Android平台上的网络通信库,主要提供网络通信和图片下载的解决方案,比如以前从网上下载图片的步骤可能是这样的流程: 在ListAdap ...