I fell across a spreadsheet that explains a method to sort both rows and columns of a matrix that contains binary data so that the number of changes between consecutive rows and cols is minimzed.
我遇到了一个电子表格,该电子表格解释了一种方法,用于对包含二进制数据的矩阵的行和列进行排序,以便最小化连续行和列之间的更改次数。
For example, starting with:
例如,从以下开始:
After 15 manual steps described in the tabs of the spreadsheed, the following table is obtained:
在传感器选项卡中描述的15个手动步骤之后,获得下表:
I would like to know:
我想知道:
- what is the common name of this algorithm or method ?
- 这种算法或方法的通用名称是什么?
- how to apply it to larger table (where 2^n would overflow...)
- 如何将它应用于更大的表(2 ^ n将溢出...)
- how to generalize it to non binary data, for example using Levenshtein distance ?
- 如何将它推广到非二进制数据,例如使用Levenshtein距离?
- if there is any link to code (Excel VBA, Python, ...) already implementing this (otherwise I'll write it ... )
- 如果有任何代码链接(Excel VBA,Python,...)已经实现了这一点(否则我会写它...)
Thanks !
谢谢 !
1 个解决方案
#1
2
You can represent each row by a vector L = [1, 1, 0, ... 1]
, and then define the distance between two lines d(L0, L1)
by the number of elements at corresponding positions which are different between L0
and L1
. This is known as the binary Hamming distance. If you had non-binary data, you would just extend your definition of distance and yes, Levenshtein distance would be an option.
您可以用向量L = [1,1,0,... 1]表示每一行,然后用L0之间不同的相应位置的元素数定义两条线d(L0,L1)之间的距离和L1。这被称为二进制汉明距离。如果你有非二进制数据,你只需要扩展你的距离定义,是的,Levenshtein距离是一个选项。
Once you have distance well-defined, the rest of your problem is minimizing distance between consecutive rows. This is exactly the Traveling salesman problem, which is known to be NP-hard(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf).
一旦你有明确的距离,你的问题的其余部分是最小化连续行之间的距离。这正是旅行商问题,已知是NP难题(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf)。
The direct solution (visiting all permutations) is O(n!), but you can do better easily by using dynamic programming, for example Held–Karp_algorithm. There are also approximate algorithms, such as the Nearest_neighbour_algorithm which quickly computes a non-optimal solution.
直接解决方案(访问所有排列)是O(n!),但您可以通过使用动态编程轻松做得更好,例如Held-Karp_algorithm。还有近似算法,例如Nearest_neighbour_algorithm,它可以快速计算非最优解。
Finally, for implementations you can easily google "traveling salesman excel/python" and find many tutorials and examples.
最后,对于实现,您可以轻松谷歌“旅行推销员excel / python”,并找到许多教程和示例。
#1
2
You can represent each row by a vector L = [1, 1, 0, ... 1]
, and then define the distance between two lines d(L0, L1)
by the number of elements at corresponding positions which are different between L0
and L1
. This is known as the binary Hamming distance. If you had non-binary data, you would just extend your definition of distance and yes, Levenshtein distance would be an option.
您可以用向量L = [1,1,0,... 1]表示每一行,然后用L0之间不同的相应位置的元素数定义两条线d(L0,L1)之间的距离和L1。这被称为二进制汉明距离。如果你有非二进制数据,你只需要扩展你的距离定义,是的,Levenshtein距离是一个选项。
Once you have distance well-defined, the rest of your problem is minimizing distance between consecutive rows. This is exactly the Traveling salesman problem, which is known to be NP-hard(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf).
一旦你有明确的距离,你的问题的其余部分是最小化连续行之间的距离。这正是旅行商问题,已知是NP难题(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf)。
The direct solution (visiting all permutations) is O(n!), but you can do better easily by using dynamic programming, for example Held–Karp_algorithm. There are also approximate algorithms, such as the Nearest_neighbour_algorithm which quickly computes a non-optimal solution.
直接解决方案(访问所有排列)是O(n!),但您可以通过使用动态编程轻松做得更好,例如Held-Karp_algorithm。还有近似算法,例如Nearest_neighbour_algorithm,它可以快速计算非最优解。
Finally, for implementations you can easily google "traveling salesman excel/python" and find many tutorials and examples.
最后,对于实现,您可以轻松谷歌“旅行推销员excel / python”,并找到许多教程和示例。