I have a JS application that needs to do a complicated sort of a large array and then display it. Using the built in array.sort(cb)
method can take up to 1 second with my data. This is long enough for my UI to get janky.
我有一个JS应用程序需要做一个复杂的大型数组,然后显示它。使用内置的array.sort(cb)方法可能需要1秒钟的数据。这足以让我的UI变得笨拙。
Because the UI is only tall enough to show a subset of the sorted array on the screen with the rest below the scroll or paginated, I had an idea. What if I made an algorithm that went through the large array and quickly did a sort in such a way that the top N items were perfectly sorted, but the remaining items in the array were imperfectly sorted. Each time I ran my algorithm it would sort a little more of the array from the top down.
因为UI的高度足以在屏幕上显示已排序数组的子集,其余部分位于滚动或分页下方,所以我有了一个想法。如果我制作了一个通过大型数组并快速排序的算法,前N个项目完全排序,但数组中的其余项目未完全排序,该怎么办?每次运行算法时,它都会从上到下对数组进行排序。
So I could break up my processing into chunks and have a smooth UI. For the first few seconds the array would not be perfectly sorted, but the imperfections would be below the scroll so they wouldn't be noticed.
因此,我可以将处理分解成块并具有流畅的UI。在最初的几秒钟内,阵列将无法完美排序,但缺陷将位于卷轴下方,因此不会被注意到。
My naive solution would be to write my own "Selection Sort" with the ability to break after N matches and resume later, but "Selection Sort" is a pretty terrible algorithm. The faster algorithms (from my understanding) have to go to completion to guarantee that the top N items are stable.
我天真的解决方案是编写我自己的“选择排序”,能够在N次匹配后中断并稍后恢复,但“选择排序”是一个非常可怕的算法。更快的算法(根据我的理解)必须完成以保证前N个项目是稳定的。
Does anyone know of an existing solution for this? Am I crazy? Any suggestions?
有谁知道现有的解决方案吗?我疯了吗?有什么建议么?
UPDATE
Taking the idea suggested by @moreON, I wrote a custom QuickSort that bails out once it has the required precision. The native sort took 1sec for this data. The regular QuickSort took around 250ms, which is already surprisingly better. The QuickSort that bails out after the first 100 items are sorted took a brisk 10ms, which is much much better. I can then take an additional 250ms to finish the sort but this doesn't matter so much because the user is already looking at the data. This reduces the user's experienced delay from 1sec to 10ms, which is pretty great.
根据@moreON提出的想法,我写了一个自定义的QuickSort,一旦它具有所需的精度就会挽救。此数据的本机排序需要1秒。常规QuickSort大约需要250毫秒,这已经出乎意料地好了。在前100个项目排序后匆匆忙忙的QuickSort快速完成了10ms,这要好得多。然后我可以再花250ms来完成排序,但这并不重要,因为用户已经在查看数据了。这将用户经历的延迟从1秒减少到10毫秒,这非常棒。
//Init 1 million random integers into array
var arr1 = [];
var arr2 = [];
for(var i=0;i<1800000;i++) {
var num = Math.floor(Math.random() * 1000000);
arr1.push(num);
arr2.push(num);
}
console.log(arr1);
//native sort
console.time("native sort");
arr1.sort(function(a,b) { return a-b; });
console.timeEnd("native sort"); //1sec
console.log(arr1);
//quicksort sort Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
function swap(arr, a, b) {
var temp = arr[a];
arr[a] = arr[b];
arr[b] = temp;
}
function cmp(a,b) {
return (a<b);
}
function partition(items, left, right) {
var pivot = items[Math.floor((right + left) / 2)];
var i = left;
var j = right;
while (i <= j) {
while (cmp(items[i],pivot)) i++;
while (cmp(pivot,items[j])) j--;
if (i <= j) {
swap(items, i, j);
i++;
j--;
}
}
return i;
}
function quickSort(items, left, right, max) {
if(max && left-1 > max) return items; //bail out early if we have enough
if (items.length > 1) {
var index = partition(items, left, right);
if (left < index - 1) quickSort(items, left, index - 1, max);
if (index < right) quickSort(items, index, right, max);
}
return items;
}
//sort first 100
console.time("partial Quicksort");
arr2 = quickSort(arr2,0,arr2.length-1,100);
console.timeEnd("partial Quicksort"); //10ms
console.log(arr2);
//sort remainder
console.time("finishing Quicksort");
arr2 = quickSort(arr2,100,arr2.length-1); //250ms
console.timeEnd("finishing Quicksort");
console.log(arr2);
4 个解决方案
#1
2
If you were to heapify array
, which I believe can be done in O(n)
time (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap), you could extract each N
items, in order, in O(N log n)
time (n
getting smaller as you extract).
如果你要堆积数组,我相信可以在O(n)时间内完成(https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap),你可以按顺序提取每个N项(O)( N log n)时间(n随着提取而变小)。
#2
1
Here is a cleaned up version of my solution that sorts a large array in batches so the JS thread doesn't stutter. In my example here, it takes a 1 second array.sort(cb)
and turns it into five separate 100ms operations. You'll want to pick the pageSize intelligently based on your data. More pages will make the final sort take longer, fewer pages will make the batches take longer.
这是我的解决方案的清理版本,它批量排序大型数组,因此JS线程不会断断续续。在我的示例中,它需要1秒的array.sort(cb)并将其转换为五个单独的100ms操作。您需要根据数据智能地选择pageSize。更多页面将使最终排序花费更长时间,更少页面将使批次花费更长时间。
var BatchedQuickSort = {
swap: function(arr, a, b) {
var temp = arr[a];
arr[a] = arr[b];
arr[b] = temp;
},
partition: function(items, left, right, cmp) {
var pivot = items[Math.floor((right + left) / 2)];
var i = left;
var j = right;
while (i <= j) {
while (cmp(items[i],pivot)<0) i++;
while (cmp(pivot,items[j])<0) j--;
if (i <= j) {
this.swap(items, i, j);
i++;
j--;
}
}
return i;
},
sort: function(items, cmp, max, left, right) { //Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
if (items.length > 1) {
left = typeof left != "number" ? 0 : left;
right = typeof right != "number" ? items.length - 1 : right;
var index = this.partition(items, left, right, cmp);
if (left < index - 1) this.sort(items, cmp, max, left, index - 1);
if (index < right && (!max || index<=max)) this.sort(items, cmp, max, index, right);
}
return items;
}
}
//Example Usage
var arr = [];
for(var i=0;i<2000000;i++) arr.push(Math.floor(Math.random() * 1000000));
function myCompare(a,b) { return a-b; }
var pageSize = Math.floor(arr.length/5);
var page = 1;
var timer = window.setInterval(function() {
arr = BatchedQuickSort.sort(arr, myCompare, pageSize*page,pageSize*(page-1));
if(page*pageSize>=arr.length) {
clearInterval(timer);
console.log("Done",arr);
}
page++;
},1);
#3
0
I think your question boils down to:
我想你的问题归结为:
How to find top N elements in large array
如何在大数组中找到前N个元素
which is kindof answered here: Find top N elements in an Array
这里有点回答:查找数组中的前N个元素
This can be solved by traversing the list once and just pick the top N elements. Θ(n).
这可以通过遍历列表一次并且只选择前N个元素来解决。 Θ(n)中。
Check it out here: https://jsfiddle.net/jeeh4a8p/1/
在这里查看:https://jsfiddle.net/jeeh4a8p/1/
function get_top_10(list) {
var top10 = new Array(10).fill(0)
for(i in list) {
var smallest_in_top10 = Math.min.apply( Math, top10 )
if(list[i] > smallest_in_top10) {
top10.splice(top10.indexOf(smallest_in_top10),1)
top10.push(list[i])
}
}
return top10
}
console.log(get_top_10([1,2,3,4,5,6,7,8,9,10,11,12]))
var random_list = [];
for (var i = 0; i < 100; i++) {
random_list.push(Math.round(Math.random() * 999999))
}
console.log(get_top_10(random_list))
function sortNumber(a,b) {
return a - b;
}
#4
-1
First of all, have some perspective on the performance improvement expectations. Efficient sorting algorithms are O(N * log2(N)). For N=1,000,000 items, N * log2(N) ~ N * 20. I doubt you have that many items that you're trying to render in a webpage.
首先,对绩效改进预期有一些看法。有效的排序算法是O(N * log2(N))。对于N = 1,000,000个项目,N * log2(N)~N * 20.我怀疑你在网页中尝试渲染的项目很多。
If you only need to render the first 25 rows, Selection Sort will take N * 25 to order them, so it'll actually perform worse, assuming comparable constant overhead.
如果你只需要渲染前25行,选择排序将需要N * 25来对它们进行排序,所以它实际上会表现更差,假设可比较的恒定开销。
If you do want to experiment with this further, one algorithm I can think of is this: maintain a binary tree of PAGE_SIZE smallest items. Keep updating it with a single pass over the data, removing the largest items when smaller ones are found. Ignoring rebalancing, it'll take you N * log2(PAGE_SIZE) to populate the tree and render your first page of results.
如果你想进一步试验这个,我能想到的一个算法就是:维护PAGE_SIZE最小项的二叉树。通过对数据的单次传递继续更新,在找到较小的项目时删除最大的项目。忽略重新平衡,它将使您N * log2(PAGE_SIZE)填充树并呈现您的第一页结果。
#1
2
If you were to heapify array
, which I believe can be done in O(n)
time (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap), you could extract each N
items, in order, in O(N log n)
time (n
getting smaller as you extract).
如果你要堆积数组,我相信可以在O(n)时间内完成(https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap),你可以按顺序提取每个N项(O)( N log n)时间(n随着提取而变小)。
#2
1
Here is a cleaned up version of my solution that sorts a large array in batches so the JS thread doesn't stutter. In my example here, it takes a 1 second array.sort(cb)
and turns it into five separate 100ms operations. You'll want to pick the pageSize intelligently based on your data. More pages will make the final sort take longer, fewer pages will make the batches take longer.
这是我的解决方案的清理版本,它批量排序大型数组,因此JS线程不会断断续续。在我的示例中,它需要1秒的array.sort(cb)并将其转换为五个单独的100ms操作。您需要根据数据智能地选择pageSize。更多页面将使最终排序花费更长时间,更少页面将使批次花费更长时间。
var BatchedQuickSort = {
swap: function(arr, a, b) {
var temp = arr[a];
arr[a] = arr[b];
arr[b] = temp;
},
partition: function(items, left, right, cmp) {
var pivot = items[Math.floor((right + left) / 2)];
var i = left;
var j = right;
while (i <= j) {
while (cmp(items[i],pivot)<0) i++;
while (cmp(pivot,items[j])<0) j--;
if (i <= j) {
this.swap(items, i, j);
i++;
j--;
}
}
return i;
},
sort: function(items, cmp, max, left, right) { //Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
if (items.length > 1) {
left = typeof left != "number" ? 0 : left;
right = typeof right != "number" ? items.length - 1 : right;
var index = this.partition(items, left, right, cmp);
if (left < index - 1) this.sort(items, cmp, max, left, index - 1);
if (index < right && (!max || index<=max)) this.sort(items, cmp, max, index, right);
}
return items;
}
}
//Example Usage
var arr = [];
for(var i=0;i<2000000;i++) arr.push(Math.floor(Math.random() * 1000000));
function myCompare(a,b) { return a-b; }
var pageSize = Math.floor(arr.length/5);
var page = 1;
var timer = window.setInterval(function() {
arr = BatchedQuickSort.sort(arr, myCompare, pageSize*page,pageSize*(page-1));
if(page*pageSize>=arr.length) {
clearInterval(timer);
console.log("Done",arr);
}
page++;
},1);
#3
0
I think your question boils down to:
我想你的问题归结为:
How to find top N elements in large array
如何在大数组中找到前N个元素
which is kindof answered here: Find top N elements in an Array
这里有点回答:查找数组中的前N个元素
This can be solved by traversing the list once and just pick the top N elements. Θ(n).
这可以通过遍历列表一次并且只选择前N个元素来解决。 Θ(n)中。
Check it out here: https://jsfiddle.net/jeeh4a8p/1/
在这里查看:https://jsfiddle.net/jeeh4a8p/1/
function get_top_10(list) {
var top10 = new Array(10).fill(0)
for(i in list) {
var smallest_in_top10 = Math.min.apply( Math, top10 )
if(list[i] > smallest_in_top10) {
top10.splice(top10.indexOf(smallest_in_top10),1)
top10.push(list[i])
}
}
return top10
}
console.log(get_top_10([1,2,3,4,5,6,7,8,9,10,11,12]))
var random_list = [];
for (var i = 0; i < 100; i++) {
random_list.push(Math.round(Math.random() * 999999))
}
console.log(get_top_10(random_list))
function sortNumber(a,b) {
return a - b;
}
#4
-1
First of all, have some perspective on the performance improvement expectations. Efficient sorting algorithms are O(N * log2(N)). For N=1,000,000 items, N * log2(N) ~ N * 20. I doubt you have that many items that you're trying to render in a webpage.
首先,对绩效改进预期有一些看法。有效的排序算法是O(N * log2(N))。对于N = 1,000,000个项目,N * log2(N)~N * 20.我怀疑你在网页中尝试渲染的项目很多。
If you only need to render the first 25 rows, Selection Sort will take N * 25 to order them, so it'll actually perform worse, assuming comparable constant overhead.
如果你只需要渲染前25行,选择排序将需要N * 25来对它们进行排序,所以它实际上会表现更差,假设可比较的恒定开销。
If you do want to experiment with this further, one algorithm I can think of is this: maintain a binary tree of PAGE_SIZE smallest items. Keep updating it with a single pass over the data, removing the largest items when smaller ones are found. Ignoring rebalancing, it'll take you N * log2(PAGE_SIZE) to populate the tree and render your first page of results.
如果你想进一步试验这个,我能想到的一个算法就是:维护PAGE_SIZE最小项的二叉树。通过对数据的单次传递继续更新,在找到较小的项目时删除最大的项目。忽略重新平衡,它将使您N * log2(PAGE_SIZE)填充树并呈现您的第一页结果。