基于日期聚类图像的算法

时间:2020-12-14 07:35:15

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning multiple days based on the frequency over a certain timespan. Consider the following groupings:

任何人都知道一种算法,它会根据拍摄照片的日期将照片分组到事件中。显然我可以按日期分组,但我想要一些更复杂的东西(可能)能够根据特定时间跨度的频率对多天的图片进行分组。请考虑以下分组:

  • 1/2/2009 15 photos
  • 1/2/2009 15张照片

  • 1/3/2009 20 photos
  • 1/3/2009 20张照片

  • 1/4/2009 13 photos
  • 1/4/2009 13张照片

  • 1/5/2009 19 photos
  • 1/5/2009 19张照片

  • 1/15/2009 5 photos
  • 1/15/2009 5张照片

Potentially these would be grouped into two groups:

这些可能会分为两组:

  1. 1/2/2009 -> 1/5/2009
  2. 1/2/2009 - > 1/5/2009

  3. 1/15/2009

Obviously there will be some tolerance(s) that need to be established.

显然,需要建立一些容忍度。

Is there any well established way of doing this, other then inventing my own top/down approach?

有没有完善的方法来做到这一点,除了发明我自己的上/下方法?

5 个解决方案

#1


You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

你可以应用几乎任何标准的聚类技术,只需要正确定义你的距离函数。当您在照片之间制作距离矩阵时,您应该考虑位置之间的物理距离(如果有的话)和创建时间戳之间的时间距离。将它们标准化并将它们放在不同的尺寸上,你甚至可以采取常规的欧氏距离。

Best of luck.

祝你好运。

#2


Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

只需将连续几天拍摄的照片(没有拍摄照片的日子)组合在一起。

#3


You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

您可能会尝试根据要创建的群集(绝对或%)的数量来动态计算容差。

#4


To get a useful clustering of pictures according to date you require the following:

要根据日期获得有用的图片聚类,您需要以下内容:

1) The number of clusters should be variable and not fixed a priori to the clustering

1)簇的数量应该是可变的,并且不是先于聚类固定的

2) The diameter of each cluster should not exceed a specific amount.

2)每个簇的直径不应超过特定量。

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

最能满足这两个要求的聚类算法是QT(质量阈值)聚类算法。来自*:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

QT(质量阈值)聚类(Heyer,Kruglyak,Yooseph,1999)是一种替代分割数据的方法,是为基因聚类而发明的。它需要比k-means更多的计算能力,但不需要先验地指定簇的数量,并且在运行多次时总是返回相同的结果。

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

虽然它主要用于基因聚类,但我认为它可以很好地满足你的需要。

#5


Try to detect the Gaps instead of the Clusters.

尝试检测Gaps而不是Clusters。

#1


You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

你可以应用几乎任何标准的聚类技术,只需要正确定义你的距离函数。当您在照片之间制作距离矩阵时,您应该考虑位置之间的物理距离(如果有的话)和创建时间戳之间的时间距离。将它们标准化并将它们放在不同的尺寸上,你甚至可以采取常规的欧氏距离。

Best of luck.

祝你好运。

#2


Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

只需将连续几天拍摄的照片(没有拍摄照片的日子)组合在一起。

#3


You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

您可能会尝试根据要创建的群集(绝对或%)的数量来动态计算容差。

#4


To get a useful clustering of pictures according to date you require the following:

要根据日期获得有用的图片聚类,您需要以下内容:

1) The number of clusters should be variable and not fixed a priori to the clustering

1)簇的数量应该是可变的,并且不是先于聚类固定的

2) The diameter of each cluster should not exceed a specific amount.

2)每个簇的直径不应超过特定量。

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

最能满足这两个要求的聚类算法是QT(质量阈值)聚类算法。来自*:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

QT(质量阈值)聚类(Heyer,Kruglyak,Yooseph,1999)是一种替代分割数据的方法,是为基因聚类而发明的。它需要比k-means更多的计算能力,但不需要先验地指定簇的数量,并且在运行多次时总是返回相同的结果。

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

虽然它主要用于基因聚类,但我认为它可以很好地满足你的需要。

#5


Try to detect the Gaps instead of the Clusters.

尝试检测Gaps而不是Clusters。