通过相似性对行和列进行排序的算法

时间:2022-06-17 21:28:43

I fell across a spreadsheet that explains a method to sort both rows and columns of a matrix that contains binary data so that the number of changes between consecutive rows and cols is minimzed.

我遇到了一个电子表格,该电子表格解释了一种方法,用于对包含二进制数据的矩阵的行和列进行排序,以便最小化连续行和列之间的更改次数。

For example, starting with:

例如,从以下开始:

通过相似性对行和列进行排序的算法

After 15 manual steps described in the tabs of the spreadsheed, the following table is obtained:

在传感器选项卡中描述的15个手动步骤之后,获得下表:

通过相似性对行和列进行排序的算法

I would like to know:

我想知道:

  1. what is the common name of this algorithm or method ?
  2. 这种算法或方法的通用名称是什么?
  3. how to apply it to larger table (where 2^n would overflow...)
  4. 如何将它应用于更大的表(2 ^ n将溢出...)
  5. how to generalize it to non binary data, for example using Levenshtein distance ?
  6. 如何将它推广到非二进制数据,例如使用Levenshtein距离?
  7. if there is any link to code (Excel VBA, Python, ...) already implementing this (otherwise I'll write it ... )
  8. 如果有任何代码链接(Excel VBA,Python,...)已经实现了这一点(否则我会写它...)

Thanks !

谢谢 !

1 个解决方案

#1


2  

You can represent each row by a vector L = [1, 1, 0, ... 1], and then define the distance between two lines d(L0, L1) by the number of elements at corresponding positions which are different between L0 and L1. This is known as the binary Hamming distance. If you had non-binary data, you would just extend your definition of distance and yes, Levenshtein distance would be an option.

您可以用向量L = [1,1,0,... 1]表示每一行,然后用L0之间不同的相应位置的元素数定义两条线d(L0,L1)之间的距离和L1。这被称为二进制汉明距离。如果你有非二进制数据,你只需要扩展你的距离定义,是的,Levenshtein距离是一个选项。

Once you have distance well-defined, the rest of your problem is minimizing distance between consecutive rows. This is exactly the Traveling salesman problem, which is known to be NP-hard(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf).

一旦你有明确的距离,你的问题的其余部分是最小化连续行之间的距离。这正是旅行商问题,已知是NP难题(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf)。

The direct solution (visiting all permutations) is O(n!), but you can do better easily by using dynamic programming, for example Held–Karp_algorithm. There are also approximate algorithms, such as the Nearest_neighbour_algorithm which quickly computes a non-optimal solution.

直接解决方案(访问所有排列)是O(n!),但您可以通过使用动态编程轻松做得更好,例如Held-Karp_algorithm。还有近似算法,例如Nearest_neighbour_algorithm,它可以快速计算非最优解。

Finally, for implementations you can easily google "traveling salesman excel/python" and find many tutorials and examples.

最后,对于实现,您可以轻松谷歌“旅行推销员excel / python”,并找到许多教程和示例。

#1


2  

You can represent each row by a vector L = [1, 1, 0, ... 1], and then define the distance between two lines d(L0, L1) by the number of elements at corresponding positions which are different between L0 and L1. This is known as the binary Hamming distance. If you had non-binary data, you would just extend your definition of distance and yes, Levenshtein distance would be an option.

您可以用向量L = [1,1,0,... 1]表示每一行,然后用L0之间不同的相应位置的元素数定义两条线d(L0,L1)之间的距离和L1。这被称为二进制汉明距离。如果你有非二进制数据,你只需要扩展你的距离定义,是的,Levenshtein距离是一个选项。

Once you have distance well-defined, the rest of your problem is minimizing distance between consecutive rows. This is exactly the Traveling salesman problem, which is known to be NP-hard(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf).

一旦你有明确的距离,你的问题的其余部分是最小化连续行之间的距离。这正是旅行商问题,已知是NP难题(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf)。

The direct solution (visiting all permutations) is O(n!), but you can do better easily by using dynamic programming, for example Held–Karp_algorithm. There are also approximate algorithms, such as the Nearest_neighbour_algorithm which quickly computes a non-optimal solution.

直接解决方案(访问所有排列)是O(n!),但您可以通过使用动态编程轻松做得更好,例如Held-Karp_algorithm。还有近似算法,例如Nearest_neighbour_algorithm,它可以快速计算非最优解。

Finally, for implementations you can easily google "traveling salesman excel/python" and find many tutorials and examples.

最后,对于实现,您可以轻松谷歌“旅行推销员excel / python”,并找到许多教程和示例。