Haskell与C ++中的简单π（x）

I'm learning Haskell. My interest is to use it for personal computer experimentation. Right now, I'm trying to see how fast Haskell can get. Many claim parity with C(++), and if that is true, I would be very happy (I should note that I will be using Haskell whether or not it's fast, but fast is still a good thing).

我正在学习Haskell。我的兴趣是将它用于个人计算机实验。现在,我正试着看看Haskell有多快。许多人声称与C(++)相同,如果这是真的,我会非常高兴(我应该注意到我将使用Haskell,无论它是否快,但快速仍然是一件好事)。

My test program implements π(x) with a very simple algorithm: Primes numbers add 1 to the result. Prime numbers have no integer divisors between 1 and √x. This is not an algorithm battle, this is purely for compiler performance.

我的测试程序用一个非常简单的算法实现π(x):Primes数字为结果加1。素数在1和√x之间没有整数除数。这不是算法之争,这纯粹是为了编译器性能。

Haskell seems to be about 6x slower on my computer, which is fine (still 100x faster than pure Python), but that could be just because I'm a Haskell newbie.

Haskell似乎在我的计算机上慢了大约6倍,这很好(仍然比纯Python快100倍),但这可能只是因为我是一个Haskell新手。

Now, my question: How, without changing the algorithm, can I optimize the Haskell implementation? Is Haskell really on performance parity with C?

现在,我的问题是:如何在不改变算法的情况下优化Haskell实现? Haskell真的与C的性能平价吗?

Here is my Haskell code:

这是我的Haskell代码:

import System.Environment

-- a simple integer square root
isqrt :: Int -> Int
isqrt = floor . sqrt . fromIntegral

-- primality test        
prime :: Int -> Bool
prime x = null [x | q <- [3, 5..isqrt x], rem x q == 0]

main = do
  n <- fmap (read . head) getArgs
  print $ length $ filter prime (2:[3, 5..n])

Here is my C++ code:

这是我的C ++代码:

#include <iostream>
#include <cmath>
#include <cstdlib>
using namespace std;

bool isPrime(int);

int main(int argc, char* argv[]) {
    int primes = 10000, count = 0;
    if (argc > 1) {
        primes = atoi(argv[1]);
    }
    if (isPrime(2)) {
        count++;
    }
    for (int i = 3; i <= primes; i+=2) {
        if (isPrime(i)){
            count++;
        }
    }
    cout << count << endl;
    return 0;
}

bool isPrime(int x){
    for (int i = 2; i <= floor(sqrt(x)); i++) {
        if (x % i == 0) {
            return false;
        }
    }
    return true;
}

2 个解决方案

#1

Your Haskell version is constructing a lazy list in prime only to test if it is null. This seems to indeed be a bottle neck. The following version runs just as fast as the C++ version on my machine:

你的Haskell版本只是在prime中构造一个惰性列表,以测试它是否为null。这似乎确实是瓶颈。以下版本的运行速度与我机器上的C ++版本一样快:

prime :: Int -> Bool
prime x = go 3
  where
    go q | q <= isqrt x = if rem x q == 0 then False else go (q+2)
    go _  = True

3.31s when compiled with -O2 vs. 3.18s for C++ with gcc 4.8 and -O3 for n=5000000.

使用-O2编译时为3.31s,对于C ++编译为3.18s,gcc为4.8,-O3为n = 5000000。

Of course, 'guessing' where the program is slow to optimize it is not a very good approach. Fortunately, Haskell has good profiling tools on board.

当然,“猜测”程序缓慢优化它并不是一个很好的方法。幸运的是,Haskell拥有良好的分析工具。

Compiling and running with

编译和运行

$ ghc --make primes.hs -O2 -prof -auto-all -fforce-recomp && ./primes 5000000 +RTS -p

gives

# primes.prof
  Thu Feb 20 00:49 2014 Time and Allocation Profiling Report  (Final)

     primes +RTS -p -RTS 5000000

  total time  =        5.71 secs   (5710 ticks @ 1000 us, 1 processor)
  total alloc = 259,580,976 bytes  (excludes profiling overheads)

COST CENTRE MODULE  %time %alloc

prime.go    Main     96.4    0.0
main        Main      2.0   84.6
isqrt       Main      0.9   15.4

                                                      individual     inherited
COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN        MAIN                     45           0    0.0    0.0   100.0  100.0
 main       Main                     91           0    2.0   84.6   100.0  100.0
  prime     Main                     92     2500000    0.7    0.0    98.0   15.4
   prime.go Main                     93   326103491   96.4    0.0    97.3   15.4
    isqrt   Main                     95           0    0.9   15.4     0.9   15.4

--- >8 ---

which clearly shows that prime is where things get hot. For more information on profiling, I'll refer you to Real World Haskell, Chap 25.

这清楚地表明,素数是事物变热的地方。有关分析的更多信息,我将向您介绍真实世界Haskell,第25章。

To really understand what is going on, you can look at (one of) GHC's intermediate languages Core, which will show you how the code looks like after optimization. Some good info is at the Haskell wiki. I would not recommend to do that unless necessary, but it is good to know that the possibility exists.

要真正了解正在发生的事情,您可以查看(其中一个)GHC的中间语言Core,它将向您展示优化后代码的外观。一些好的信息在Haskell维基。除非必要,否则我不建议这样做,但很高兴知道存在这种可能性。

To your other questions:

对于你的其他问题:

1) How, without changing the algorithm, can I optimize the Haskell implementation?

1)如何在不改变算法的情况下优化Haskell实现?

Profile, and try to write inner loops so that they don't do any memory allocations and can be made strict by the compiler. Doing so can take some practice and experience.

配置文件,并尝试编写内部循环,以便它们不进行任何内存分配,并且可以由编译器严格控制。这样做可以采取一些实践和经验。

2) Is Haskell really on performance parity with C?

2)Haskell真的与C的性能平价吗?

That depends. GHC is amazing and can often optimize your program very well. If you know what you're doing you can usually get close to the performance of optimized C (100% - 200% of C's speed). That said, these optimizations are not always easy or pretty to the eye and high level Haskell can be slower. But don't forget that you're gaining amazing expressiveness and high level abstractions when using Haskell. It will usually be fast enough for all but the most performance critical applications and even then you can often get pretty close to C with some profiling and performance optimizations.

那要看。 GHC非常棒,通常可以很好地优化您的计划。如果您知道自己在做什么,通常可以接近优化C的性能(100% - C的速度的200%)。也就是说,这些优化并不总是容易或看起来很漂亮,而高级Haskell可能会更慢。但是不要忘记在使用Haskell时你获得了惊人的表现力和高水平的抽象。对于除了性能最关键的应用程序之外的所有应用程序来说,它通常都足够快,即便如此,通过一些性能分析和性能优化,您通常可以非常接近C。

#2

I dont think that the Haskell version (original and improved by first answer) are equivalent with the C++ version. The reason is this: Both only consider every second element (in the prime function), while the C++ version scans every element (only i++ in the isPrime() function.

我不认为Haskell版本(原始和第一回答改进)与C ++版本相同。原因是:两者都只考虑每个第二个元素(在prime函数中),而C ++版本扫描每个元素(在isPrime()函数中只有i ++。

When i fix this (change i++ to i+=2 in the isPrime() function for C++) i get down to almost 1/3 of the runtime of the optimized Haskell version (2.1s C++ vs 6s Haskell).

当我解决这个问题时(在C ++的isPrime()函数中将i ++更改为i + = 2)我得到了优化的Haskell版本(2.1s C ++ vs 6s Haskell)运行时的近1/3。

The output remains the same for both (of course). Note that this is no specific opimization of the C++ version ,just an adaptation of the trick already applied in the Haskell version.

两者的输出保持不变(当然)。请注意,这不是C ++版本的特定选择,只是对Haskell版本中已经应用的技巧的改编。

#1