算法用O(kn log n)平衡K-D树

I tried to implement a balanced K-D tree with O(kn log n), I used presorted K arrays (sorted arrays for each index) to get O(kn log n), and median to get balanced tree.

我尝试使用O(kn log n)来实现均衡的K- d树，我使用预估的K数组(每个索引的排序数组)来获得O(kn log n)，中位数来获得均衡的树。

The problem I faced was that mostly the median value at some level ,for example the median for x axis, maybe chosen again at another subsequent level, for example for y axis.

我所面临的问题是，在某一层的中值，例如x轴的中值，可能在另一层再次选择，例如y轴。

I tried to solve this by dividing y sorted array to two arrays by using chosen x value as a pivot, but this way wouldn't yield a balanced tree.

我试着用选择的x值作为支点，把y排序数组分成两个数组来解决这个问题，但这种方法不会产生一个平衡的树。

Any idea how to get K-D balanced tree with O(kn log n)?

知道怎么用O(kn log n)得到K-D平衡树吗?

EDIT

编辑

Quoted from wiki https://en.wikipedia.org/wiki/K-d_tree

引用维基https://en.wikipedia.org/wiki/K-d_tree

Alternative algorithms for building a balanced k-d tree presort the data prior to building the tree. They then maintain the order of the presort during tree construction and hence eliminate the costly step of finding the median at each level of subdivision. Two such algorithms build a balanced k-d tree to sort triangles in order to improve the execution time of ray tracing for three-dimensional computer graphics. These algorithms presort n triangles prior to building the k-d tree, then build the tree in O(n log n) time in the best case.[5][6] An algorithm that builds a balanced k-d tree to sort points has a worst-case complexity of O(kn log n).[7] This algorithm presorts n points in each of k dimensions using an O(n log n) sort such as Heapsort or Mergesort prior to building the tree. It then maintains the order of these k presorts during tree construction and thereby avoids finding the median at each level of subdivision.

构建均衡的k-d树的替代算法在构建树之前先对数据进行预处理。然后，它们在构建树的过程中保持了预测的顺序，从而消除了在每个细分层次上寻找中位数的代价高昂的步骤。两种算法构建了一种均衡的k-d树对三角形进行排序，以提高三维计算机图形射线追踪的执行时间。这些算法在构建k-d树之前就会显示n个三角形，然后在最好的情况下在O(n log n)时间内构建树。[5][6]一种建立平衡k-d树的算法，它的复杂度是O(kn log n)的最坏情况。[7]该算法利用O(n log n)排序(如堆排序或归并排序)在构建树之前，在每个k维中显示n点。然后在树形结构中维护这些k个presorts的顺序，从而避免在每个级别的细分中找到中位数。

Anyone could provide such algorithm provided above?

任何人都可以提供上述算法吗?

EDIT

编辑

The came up with a way but it doesn't work if there is any duplicate value of specific axis for the median.

提出了一种方法，但是如果中间值的特定轴有重复值，它就不起作用。

For example

例如

x1 = [ (0, 7), (1, 3), (3, 0), (3, 1), (6, 2) ] y1 = [ (3, 0), (3, 1), (6, 2), (1, 3), (0, 7) ]

x1 =[(0,7),(1,3),(3,0)、(1),(2)]日元=[(3,0)、(3,1),(6 2),(1,3)(0,7)]

The median of x-axis is 3. So when we want to split the array y11 and y12 we have to use > and < to distribute y array left and right considering pivot as delimiter.

x轴的中值是3。所以当我们要分割数组y11和y12时我们必须使用>和 <来分配y数组的左和右，把主作为分隔符。< p>

there is no guarantee one of them is correct if the median a on specific axis is duplicated

如果在特定轴上的中位数a被重复，则不能保证其中一个是正确的

Consider the partition on x axis, and there is no problem on x1 array following completion of above example of first step partition:

考虑x轴上的分区，完成上述第一步分区示例后，x1数组没有问题:

median=(3,0)
The pivot = 3 // is it's the median of x axis
y11[],y12[] 
for(i = 0 ; i < x1.size;i++)
  if(y1[i].getX()<pivot)
    y11.add(y1[i])
  else 
    if(y1[i].getX()>pivot)
     y12.add(y1[i])

This will result y11 = [(2 ,1) , (1, 3), (0, 7) ] y12 = [ (6,2) ]

这将使得日元=[(2,1)、(1,3)(0,7)]日元=[(6 2)]

Any idea how to handle such case? Or is there any another presorting kd-tree presorting algorithm O(kn log n) ?

你知道怎么处理这种情况吗?或者有没有其他的预估kd-tree预估算法O(kn log n) ?

2 个解决方案

#1

To elaborate on my comment (and Anony-Mousse's answer, probably):

详细阐述我的评论(可能还有阿侬-摩丝的回答):

The key idea with pre-sorting in constructing KD-trees is to keep the order during split. The overhead looks quite high, a comparative benchmark with re-sorting (and k-select) seems in order.
Some proof-of principle Java source code:

构建kd -tree的关键思想是在分割时保持顺序。开销看起来相当高，重新排序(和k-select)的比较基准似乎是有序的。Java源代码的一些基本证明:

package net.*.coder.greybeard.sandbox;

import java.util.Arrays;
import java.util.Comparator;
import java.util.LinkedList;

/** finger exercise pre-sorting & split for KD-tree construction
 *  (re. https://*.com/q/35225509/3789665) */
public class KDPreSort {
 /** K-dimensional key, dimensions fixed
  *   by number of coordinates in construction */
    static class KKey {
        public static KKey[] NONE = {};
        final Comparable[]coordinates;
        public KKey(Comparable ...coordinates) {
            this.coordinates = coordinates;
        }
    /** @return {@code Comparator<KKey>} for coordinate {@code n}*/
        static Comparator<KKey> comparator(int n) { // could be cached
            return new Comparator<KDPreSort.KKey>() { @Override
                    public int compare(KKey l, KKey r) {
                        return l.coordinates[n]
                            .compareTo(r.coordinates[n]);
                    }
                };
        }
        @Override
        public String toString() {
            StringBuilder sb = new StringBuilder(
                Arrays.deepToString(coordinates));
            sb.setCharAt(0, '(');
            sb.setCharAt(sb.length()-1, ')');
            return sb.toString();
        }
    }

 // static boolean trimLists = true; // introduced when ArrayList was used in interface

/** @return two arrays of {@code KKey}s: comparing smaller than
 *    or equal to {@code pivot} (according to {@code comp)},
 *    and greater than pivot -
 *    in the same order as in {@code keys}. */
    static KKey[][] split(KKey[] keys, KKey pivot, Comparator<KKey> comp) {
        int length = keys.length;
        ArrayList<KKey>
            se = new ArrayList<>(length),
            g = new ArrayList<>(length);
        for (KKey k: keys) {
        // pick List to add to
            List<KKey>d = comp.compare(k, pivot) <= 0 ? se : g;
            d.add(k);
        }
//      if (trimLists) { se.trimToSize(); g.trimToSize(); }
        return new KKey[][] { se.toArray(KKey.NONE), g.toArray(KKey.NONE) };
    }
 /** @return two arrays of <em>k</em> arrays of {@code KKey}s:
  *  comparing smaller than or equal to {@code pivot}
  *   (according to {@code comp)}, and greater than pivot,
  *  in the same order as in {@code keysByCoordinate}. */
    static KKey[][][]
        splits(KKey[][] keysByCoordinate, KKey pivot, Comparator<KKey> comp) {
        final int length = keysByCoordinate.length;
        KKey[][]
            se = new KKey[length][],
            g = new KKey[length][],
            splits;
        for (int i = 0 ; i < length ; i++) {
            splits = split(keysByCoordinate[i], pivot, comp);
            se[i] = splits[0];
            g[i] = splits[1];
        }
        return new KKey[][][] { se, g };
    }
 // demo
    public static void main(String[] args) {
    // from https://*.com/q/17021379/3789665
        Integer [][]coPairs = {// {0, 7}, {1, 3}, {3, 0}, {3, 1}, {6, 2},
                {12, 21}, {13, 27}, {19, 5}, {39, 5}, {49, 63}, {43, 45}, {41, 22}, {27, 7}, {20, 12}, {32, 11}, {24, 56},
            };
        KKey[] someKeys = new KKey[coPairs.length];
        for (int i = 0; i < coPairs.length; i++) {
            someKeys[i] = new KKey(coPairs[i]);
        }
    //presort
        Arrays.sort(someKeys, KKey.comparator(0));
        List<KKey> x = new ArrayList<>(Arrays.asList(someKeys));
        System.out.println("by x: " + x);
        KKey pivot = someKeys[someKeys.length/2];
        Arrays.sort(someKeys, KKey.comparator(1));
        System.out.println("by y: " + Arrays.deepToString(someKeys));
    // split by x
        KKey[][] allOrdered = new KKey[][] { x.toArray(KKey.NONE), someKeys },
            xSplits[] = splits(allOrdered, pivot, KKey.comparator(0));
        for (KKey[][] c: xSplits)
            System.out.println("split by x of " + pivot + ": "
                + Arrays.deepToString(c));
    // split "higher x" by y
        pivot = xSplits[1][1][xSplits[1][1].length/2];
        KKey[][] ySplits[] = splits(xSplits[1], pivot, KKey.comparator(1));
        for (KKey[][] c: ySplits)
            System.out.println("split by y of " + pivot + ": "
                + Arrays.deepToString(c));
    }
}

(Didn't find a suitable answer/implementation on SE while not investing too much energy. The output was non-convincing with your example, with the longer one, I had to re-format it to believe it.
The code looks ugly, in all likelihood because it is: if so inclined re-read about the licence of code posted on SE, an visit Code Review.) (Consider that there's voting, accepting, and awarding bounties, and re-visit Anony-Mousse's answer.)

(在SE上找不到合适的答案/实现，而没有投入太多精力。对于您的示例，输出是不可信的，对于较长的示例，我必须重新格式化它才能相信它。代码看起来很丑，很有可能是这样的:如果如此倾向于重新阅读发布在SE上的代码许可，访问代码审查。

#2

When splitting the data, you need to retain the sort order.

分割数据时，需要保持排序顺序。

E.g. using data (x,y) we build

例如，使用我们构建的数据(x,y)

x1 = [ (0, 7), (1, 3), (3, 0), (4, 2), (6, 1) ]
y1 = [ (3, 0), (6, 1), (3, 2), (1, 3), (0, 7) ]

If we split at x now, we need to filter both sets by the record at x=3,y=0.

如果我们现在在x处分割，我们需要根据记录x=3 y=0来过滤这两个集合。

I.e. split both lists, removing (3,0), all items with x<3 go to the first list each, all with x>3 go to the second (order unchanged):

即将两个列表分割，删除(3,0)，所有x<3的项分别到第一个列表，x>3到第二个列表(顺序不变):

x1 -> filter to  x11 = [ (0, 7), (1, 3) ]  x12 = [ (4, 2), (6, 1) ]
y1 -> filter to  y11 = [ (1, 3), (0, 7) ]  y12 = [ (6, 1), (4, 2) ]

The point is to filter each sorted list by the x values, while keeping the sorting order (so this is in O(n*k) in each of O(log n) levels). If you use only x1, and reconstruct y11 and y12 from x1, then you would need to sort again. By necessity, it is the same as if you sort once by x, once by y. Except that we did not sort again, only select.

重点是根据x值过滤每个排序后的列表，同时保持排序顺序(所以这是O(n*k)在每个O(log n)级别)。如果你只使用x1，然后从x1重构y11和y12，那么你需要重新排序。必然地，它和你对x排序一次，对y排序一次是一样的，除了我们没有再排序，只有选择。

I do not think this is much better in practise. Sorting is cheaper than the extra memory.

我不认为这在实践中更好。排序比额外的内存更便宜。

#1