So I think I'm going to get buried for asking such a trivial question but I'm a little confused about something.
我想我会因为问这样一个琐碎的问题而被埋葬,但我对某些事情有点困惑。
I have implemented quicksort in Java and C and I was doing some basic comparissons. The graph came out as two straight lines, with the C being 4ms faster than the Java counterpart over 100,000 random integers.
我在Java和C中实现了快速排序,我做了一些基本的比较。这个图是两条直线,C比10万个随机整数快4毫秒。
The code for my tests can be found here;
我的测试代码可以在这里找到;
android-benchmarks
I wasn't sure what an (n log n) line would look like but I didn't think it would be straight. I just wanted to check that this is the expected result and that I shouldn't try to find an error in my code.
我不确定an (n log n)是什么样子,但我不认为它是直的。我只是想检查一下这是否是预期的结果,我不应该试图在代码中发现错误。
I stuck the formula into excel and for base 10 it seems to be a straight line with a kink at the start. Is this because the difference between log(n) and log(n+1) increases linearly?
我把这个公式应用到excel中,以10为基数,它在开始时看起来是一条直线。这是因为log(n)和log(n+1)之间的差线性增加了吗?
Thanks,
谢谢,
Gav
Gav
7 个解决方案
#1
76
Make the graph bigger and you'll see that O(n logn) isn't quite a straight line. But yes, it is pretty near to linear behaviour. To see why, just take the logarithm of a few very large numbers.
把图形放大,你会看到O(n logn)不是一条直线。但是,是的,它非常接近线性行为。要知道为什么,取几个非常大的数的对数。
For example (base 10):
例如(基地10):
log(1000000) = 6
log(1000000000) = 9
…
So, to sort 1,000,000 numbers, an O(n logn) sorting adds a measly factor 6 (or just a bit more since most sorting algorithms will depend on base 2 logarithms). Not an awful lot.
因此,要对1,000,000个数字排序,一个O(n logn)排序添加了一个微不足道的因子6(或者只是稍微多一点,因为大多数排序算法都依赖于以2为底的对数)。不是很多。
In fact, this log factor is so extraordinarily small that for most orders of magnitude, established O(n logn) algorithms outperform linear time algorithms. A prominent example is the creation of a suffix array data structure.
事实上,这个日志因子非常小,对于大多数数量级来说,建立的O(n logn)算法胜过线性时间算法。一个突出的例子是创建后缀数组数据结构。
A simple case has recently bitten me when I tried to improve a quicksort sorting of short strings by employing radix sort. Turns out, for short strings, this (linear time) radix sort was faster than quicksort, but there was a tipping point for still relatively short strings, since radix sort crucially depends on the length of the strings you sort.
最近,我尝试通过使用基数排序来改进短字符串的快速排序,这是一个简单的例子。事实证明,对于短字符串,这个(线性时间)基数排序要比快速排序快,但是对于仍然相对较短的字符串,有一个临界点,因为基数排序关键取决于你排序的字符串的长度。
#2
10
FYI, quicksort is actually O(n^2), but with an average case of O(nlogn)
通知你,快速排序是O(n ^ 2),但平均情况下的O(nlogn)
FYI, there is a pretty big difference between O(n) and O(nlogn). That's why it's not boundable by O(n) for any constant.
FYI, O(n)和O(nlogn)之间有很大的差别。这就是为什么它不能被O(n)作为常数。
For a graphical demonstration see:
为图形演示,请参阅:
#3
5
For even more fun in a similar vein, try plotting the time taken by n operations on the standard disjoint set data structure. It has been shown to be asymptotically n α(n) where α(n) is the inverse of the Ackermann function (though your usual algorithms textbook will probably only show a bound of n log log n or possibly n log* n). For any kind of number that you will be likely to encounter as the input size, α(n) ≤ 5 (and indeed log* n ≤ 5), although it does approach infinity asymptotically.
更有趣的是,尝试在标准的分离集数据结构中绘制n个操作所占用的时间。它已被证明是渐近nα(n),α(n)是阿克曼的逆函数(尽管你通常算法教科书可能会只显示一个绑定的n对数o(log n)或n * n)。任何类型的数量,你将可能遇到作为输入大小,α(n)≤5(实际上日志* n≤5),尽管它无穷渐近方法。
What I suppose you can learn from this is that while asymptotic complexity is a very useful tool for thinking about algorithms, it is not quite the same thing as practical efficiency.
我想你们可以从中学到的是,虽然渐近的复杂性是思考算法的一个非常有用的工具,但它与实际效率并不完全一样。
#4
3
- Usually the O( n*log(n) ) algorithms have a 2-base logarithmic implementation.
- 通常,O(n*log(n))算法有一个2基对数的实现。
- For n = 1024, log(1024) = 10, so n*log(n) = 1024*10 = 10240 calculations, an increase by an order of magnitude.
- 对于n = 1024, log(1024) = 10,所以n*log(n) = 1024*10 = 10240计算,增加了一个数量级。
So, O(n*log(n)) is similar to linear only for a small amount of data.
因此,O(n*log(n))类似于只用于少量数据的线性。
Tip: don't forget that quicksort behaves very well on random data and that it's not an O(n*log(n)) algorithm.
提示:不要忘记快速排序在随机数据上的表现很好,而且它不是O(n*log(n))算法。
#5
2
Any data can be plotted on a line if the axes are chosen correctly :-)
如果正确选择了坐标轴,任何数据都可以绘制在直线上:-)
Wikipedia says Big-O is the worst case (i.e. f(x) is O(N) means f(x) is "bounded above" by N) https://en.wikipedia.org/wiki/Big_O_notation
*说大O是最坏的情况(即f(x)是O(N)的意思是f(x)被N) https://en.wikipedia.org/wiki/Big_O_notation。
Here is a nice set of graphs depicting the differences between various common functions: http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/
这里有一组很好的图表,描述了各种常见功能之间的差异:http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/。
The derivative of log(x) is 1/x. This is how quickly log(x) increases as x increases. It is not linear, though it may look like a straight line because it bends so slowly. When thinking of O(log(n)), I think of it as O(N^0+), i.e. the smallest power of N that is not a constant, since any positive constant power of N will overtake it eventually. It's not 100% accurate, so professors will get mad at you if you explain it that way.
log(x)的导数是1/x。这是当x增加时log(x)增加的速度。它不是线性的,尽管它看起来像一条直线,因为它弯曲得很慢。当考虑O(log(n)),我认为它是O(n ^ 0 +),即最小的功率n,这并不是一个常数,因为任何积极的恒功率n最终将取代它。这不是百分百准确的,所以如果你这样解释,教授们会对你发火的。
The difference between logs of two different bases is a constant multiplier. Look up the formula for converting logs between two bases: (under "change of base" here: https://en.wikipedia.org/wiki/Logarithm) The trick is to treat k and b as constants.
两个不同碱基的对数的差是一个常数乘子。查找在两个碱基之间转换日志的公式:(在“base change of base”下面:https://en.wikipedia.org/wiki/log)技巧是将k和b作为常量处理。
In practice, there are normally going to be some hiccups in any data you plot. There will be differences in things outside your program (something swapping into the cpu ahead of your program, cache misses, etc). It takes many runs to get reliable data. Constants are the biggest enemy of trying to apply Big O notation to actual runtime. An O(N) algorithm with a high constant can be slower than an O(N^2) algorithm for small enough N.
实际上,在任何数据中,通常都会出现一些问题。在你的程序之外的东西会有不同(在你的程序,缓存丢失等等)。要获得可靠的数据需要花费很多时间。常量是将大O符号应用到实际运行时的最大敌人。一个高常数的O(N)算法比O(N 2)算法要小得多。
#6
1
log(N) is (very) roughly the number of digits in N. So, for the most part, there is little difference between log(n) and log(n+1)
log(N)是N的数字,所以在大多数情况下,log(N)和log(N +1)之间没有什么区别
#7
0
Try plotting an actual linear line on top of it and you'll see the small increase. Note that the Y value at 50,0000 is less than the 1/2 Y value at 100,000.
试着在上面画一条直线,你会看到小的增长。请注意,Y值为50,0000的值小于10万的1/2 Y值。
It's there, but it's small. Which is why O(nlog(n)) is so good!
它在那里,但它很小。这就是为什么O(nlog(n))是如此的好!
#1
76
Make the graph bigger and you'll see that O(n logn) isn't quite a straight line. But yes, it is pretty near to linear behaviour. To see why, just take the logarithm of a few very large numbers.
把图形放大,你会看到O(n logn)不是一条直线。但是,是的,它非常接近线性行为。要知道为什么,取几个非常大的数的对数。
For example (base 10):
例如(基地10):
log(1000000) = 6
log(1000000000) = 9
…
So, to sort 1,000,000 numbers, an O(n logn) sorting adds a measly factor 6 (or just a bit more since most sorting algorithms will depend on base 2 logarithms). Not an awful lot.
因此,要对1,000,000个数字排序,一个O(n logn)排序添加了一个微不足道的因子6(或者只是稍微多一点,因为大多数排序算法都依赖于以2为底的对数)。不是很多。
In fact, this log factor is so extraordinarily small that for most orders of magnitude, established O(n logn) algorithms outperform linear time algorithms. A prominent example is the creation of a suffix array data structure.
事实上,这个日志因子非常小,对于大多数数量级来说,建立的O(n logn)算法胜过线性时间算法。一个突出的例子是创建后缀数组数据结构。
A simple case has recently bitten me when I tried to improve a quicksort sorting of short strings by employing radix sort. Turns out, for short strings, this (linear time) radix sort was faster than quicksort, but there was a tipping point for still relatively short strings, since radix sort crucially depends on the length of the strings you sort.
最近,我尝试通过使用基数排序来改进短字符串的快速排序,这是一个简单的例子。事实证明,对于短字符串,这个(线性时间)基数排序要比快速排序快,但是对于仍然相对较短的字符串,有一个临界点,因为基数排序关键取决于你排序的字符串的长度。
#2
10
FYI, quicksort is actually O(n^2), but with an average case of O(nlogn)
通知你,快速排序是O(n ^ 2),但平均情况下的O(nlogn)
FYI, there is a pretty big difference between O(n) and O(nlogn). That's why it's not boundable by O(n) for any constant.
FYI, O(n)和O(nlogn)之间有很大的差别。这就是为什么它不能被O(n)作为常数。
For a graphical demonstration see:
为图形演示,请参阅:
#3
5
For even more fun in a similar vein, try plotting the time taken by n operations on the standard disjoint set data structure. It has been shown to be asymptotically n α(n) where α(n) is the inverse of the Ackermann function (though your usual algorithms textbook will probably only show a bound of n log log n or possibly n log* n). For any kind of number that you will be likely to encounter as the input size, α(n) ≤ 5 (and indeed log* n ≤ 5), although it does approach infinity asymptotically.
更有趣的是,尝试在标准的分离集数据结构中绘制n个操作所占用的时间。它已被证明是渐近nα(n),α(n)是阿克曼的逆函数(尽管你通常算法教科书可能会只显示一个绑定的n对数o(log n)或n * n)。任何类型的数量,你将可能遇到作为输入大小,α(n)≤5(实际上日志* n≤5),尽管它无穷渐近方法。
What I suppose you can learn from this is that while asymptotic complexity is a very useful tool for thinking about algorithms, it is not quite the same thing as practical efficiency.
我想你们可以从中学到的是,虽然渐近的复杂性是思考算法的一个非常有用的工具,但它与实际效率并不完全一样。
#4
3
- Usually the O( n*log(n) ) algorithms have a 2-base logarithmic implementation.
- 通常,O(n*log(n))算法有一个2基对数的实现。
- For n = 1024, log(1024) = 10, so n*log(n) = 1024*10 = 10240 calculations, an increase by an order of magnitude.
- 对于n = 1024, log(1024) = 10,所以n*log(n) = 1024*10 = 10240计算,增加了一个数量级。
So, O(n*log(n)) is similar to linear only for a small amount of data.
因此,O(n*log(n))类似于只用于少量数据的线性。
Tip: don't forget that quicksort behaves very well on random data and that it's not an O(n*log(n)) algorithm.
提示:不要忘记快速排序在随机数据上的表现很好,而且它不是O(n*log(n))算法。
#5
2
Any data can be plotted on a line if the axes are chosen correctly :-)
如果正确选择了坐标轴,任何数据都可以绘制在直线上:-)
Wikipedia says Big-O is the worst case (i.e. f(x) is O(N) means f(x) is "bounded above" by N) https://en.wikipedia.org/wiki/Big_O_notation
*说大O是最坏的情况(即f(x)是O(N)的意思是f(x)被N) https://en.wikipedia.org/wiki/Big_O_notation。
Here is a nice set of graphs depicting the differences between various common functions: http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/
这里有一组很好的图表,描述了各种常见功能之间的差异:http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/。
The derivative of log(x) is 1/x. This is how quickly log(x) increases as x increases. It is not linear, though it may look like a straight line because it bends so slowly. When thinking of O(log(n)), I think of it as O(N^0+), i.e. the smallest power of N that is not a constant, since any positive constant power of N will overtake it eventually. It's not 100% accurate, so professors will get mad at you if you explain it that way.
log(x)的导数是1/x。这是当x增加时log(x)增加的速度。它不是线性的,尽管它看起来像一条直线,因为它弯曲得很慢。当考虑O(log(n)),我认为它是O(n ^ 0 +),即最小的功率n,这并不是一个常数,因为任何积极的恒功率n最终将取代它。这不是百分百准确的,所以如果你这样解释,教授们会对你发火的。
The difference between logs of two different bases is a constant multiplier. Look up the formula for converting logs between two bases: (under "change of base" here: https://en.wikipedia.org/wiki/Logarithm) The trick is to treat k and b as constants.
两个不同碱基的对数的差是一个常数乘子。查找在两个碱基之间转换日志的公式:(在“base change of base”下面:https://en.wikipedia.org/wiki/log)技巧是将k和b作为常量处理。
In practice, there are normally going to be some hiccups in any data you plot. There will be differences in things outside your program (something swapping into the cpu ahead of your program, cache misses, etc). It takes many runs to get reliable data. Constants are the biggest enemy of trying to apply Big O notation to actual runtime. An O(N) algorithm with a high constant can be slower than an O(N^2) algorithm for small enough N.
实际上,在任何数据中,通常都会出现一些问题。在你的程序之外的东西会有不同(在你的程序,缓存丢失等等)。要获得可靠的数据需要花费很多时间。常量是将大O符号应用到实际运行时的最大敌人。一个高常数的O(N)算法比O(N 2)算法要小得多。
#6
1
log(N) is (very) roughly the number of digits in N. So, for the most part, there is little difference between log(n) and log(n+1)
log(N)是N的数字,所以在大多数情况下,log(N)和log(N +1)之间没有什么区别
#7
0
Try plotting an actual linear line on top of it and you'll see the small increase. Note that the Y value at 50,0000 is less than the 1/2 Y value at 100,000.
试着在上面画一条直线,你会看到小的增长。请注意,Y值为50,0000的值小于10万的1/2 Y值。
It's there, but it's small. Which is why O(nlog(n)) is so good!
它在那里,但它很小。这就是为什么O(nlog(n))是如此的好!