-
概念
ϵ 邻域:
给定点的ϵ 为半径的区域核心点(core points):
如果点p 的ϵ 邻域内的点数大于minPts ,那么p 是核心点直接可达(directly reachable):
核心点p 到其ϵ 邻域内的所有点是直接可达的。(注意必须是p 必须是核心点)可达(reachable):
如果存在一条路径p1=p,p2,...,pn−1,pn=q ,如果对于任意i ,pi 到pi+1 是直接可达的,那么p 到q 是可达的。异常点(outliers):
对于点p ,如果没有任意一个点到p 可达,那么称p 是异常点 -
原理
簇(cluster):
如果p是核心点,那么它可达到的所有点组成一个簇。边界点(edge):
簇中的非核心点称为边界点,因为它们无法到达其他点,而只能构成簇的边界。实例:
伪代码:
DBSCAN的过程可以理解为从一个核心出发,不断扩展簇的范围,直到遇到边界点为止。
DBSCAN(D, eps, MinPts) {
C = 0
for each point P in dataset D {
if P is visited
continue next point
mark P as visited
NeighborPts = regionQuery(P, eps)
if sizeof(NeighborPts) < MinPts
mark P as NOISE
else {
C = next cluster
expandCluster(P, NeighborPts, C, eps, MinPts)
}
}
}
expandCluster(P, NeighborPts, C, eps, MinPts) {
add P to cluster C
for each point P' in NeighborPts {
if P' is not visited {
mark P' as visited
NeighborPts' = regionQuery(P', eps)
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
}
if P' is not yet member of any cluster
add P' to cluster C
}
}
regionQuery(P, eps)
return all points within P's eps-neighborhood (including P)