我如何知道一个算法(例如k-means)要运行多久?

时间:2022-12-25 21:26:16

For example, I'm running the k-means algorithm on 1 million data points. Each point is 128-dimensional, and I want 1000 clusters. Wikipedia tells me that its complexity is n^(dk+1)log(n), where d is the number of dimensions, k number of clusters and n number of instances. Knowing that, how can I get an estimate of how long it will run on my 8-Gb RAM, 2.6GHz Intel Core i5 MacBook Pro? What is the best way to calculate this estimate? Is there a way to calculate it theoretically or should I do a few experiments on smaller sets and see how long it takes?
I would really like to have rough estimate before I spend hours or days without knowing when it might stop. Thanks so much for your help! I really appreciate it :).

例如,我运行的k-means算法在100万个数据点上。每个点都是128维的,我想要1000个集群。我在*上看到,其复杂性是n ^(dk + 1)log(n),k d维度的数量,数量的集群和n的实例数量。知道了这一点,我怎么能估计出它在我的8-Gb、2.6GHz的英特尔酷睿i5 MacBook Pro上运行的时间呢?计算这个估计值的最好方法是什么?有没有一种方法来计算它呢?或者我应该做几个小的实验,看看需要多长时间?我真的希望在我花费数小时或数天时间而不知道何时会停止之前有一个粗略的估计。非常感谢你的帮助!我真的很感激。

Ps. I'm using pythons' scipy kmeans

我用的是python的scipy kmeans

2 个解决方案

#1


0  

Do some experiments. k-means is popular only because it usually runs much, much faster than that asymptotic bound would suggest.

做一些实验。k-means之所以受欢迎,仅仅是因为它的运行速度通常比渐近线所表示的快得多。

#2


0  

There is so much machine-related and algorithm-related specifics which is hidden in the big-O constant that it won't be possible to estimate it (for your machine and your SciPy) theoretically.

有太多与机器相关的和与算法相关的细节隐藏在big-O常量中,因此无法从理论上估计它(对于您的机器和SciPy)。

However nothing will prevent you from finding the constant experimentally - as you said: "do a few experiments on smaller sets".

然而,没有什么可以阻止你从实验中找到常数——正如你所说:“在小的集合上做一些实验”。

#1


0  

Do some experiments. k-means is popular only because it usually runs much, much faster than that asymptotic bound would suggest.

做一些实验。k-means之所以受欢迎,仅仅是因为它的运行速度通常比渐近线所表示的快得多。

#2


0  

There is so much machine-related and algorithm-related specifics which is hidden in the big-O constant that it won't be possible to estimate it (for your machine and your SciPy) theoretically.

有太多与机器相关的和与算法相关的细节隐藏在big-O常量中,因此无法从理论上估计它(对于您的机器和SciPy)。

However nothing will prevent you from finding the constant experimentally - as you said: "do a few experiments on smaller sets".

然而,没有什么可以阻止你从实验中找到常数——正如你所说:“在小的集合上做一些实验”。