缓慢的AES GCM加密和解密与Java 8u20。

时间:2021-04-12 18:17:35

I am trying to encrypt and decrypt data using AES/GCM/NoPadding. I installed the JCE Unlimited Strength Policy Files and ran the (simple minded) benchmark below. I've done the same using OpenSSL and was able to achieve more than 1 GB/s encryption and decryption on my PC.

我正在尝试使用AES/GCM/ nopadd加密和解密数据。我安装了JCE无限强度策略文件,并运行下面的(简单的思想)基准。我使用了OpenSSL,并且能够在我的PC上实现超过1 GB/s的加密和解密。

With the benchmark below I'm only able to get 3 MB/s encryption and decryption using Java 8 on the same PC. Any idea what I am doing wrong?

在下面的基准测试中,我只能在同一台PC上使用Java 8获得3 MB/s的加密和解密。知道我做错了什么吗?

public static void main(String[] args) throws Exception {
    final byte[] data = new byte[64 * 1024];
    final byte[] encrypted = new byte[64 * 1024];
    final byte[] key = new byte[32];
    final byte[] iv = new byte[12];
    final Random random = new Random(1);
    random.nextBytes(data);
    random.nextBytes(key);
    random.nextBytes(iv);

    System.out.println("Benchmarking AES-256 GCM encryption for 10 seconds");
    long javaEncryptInputBytes = 0;
    long javaEncryptStartTime = System.currentTimeMillis();
    final Cipher javaAES256 = Cipher.getInstance("AES/GCM/NoPadding");
    byte[] tag = new byte[16];
    long encryptInitTime = 0L;
    long encryptUpdate1Time = 0L;
    long encryptDoFinalTime = 0L;
    while (System.currentTimeMillis() - javaEncryptStartTime < 10000) {
        random.nextBytes(iv);
        long n1 = System.nanoTime();
        javaAES256.init(Cipher.ENCRYPT_MODE, new SecretKeySpec(key, "AES"), new GCMParameterSpec(16 * Byte.SIZE, iv));
        long n2 = System.nanoTime();
        javaAES256.update(data, 0, data.length, encrypted, 0);
        long n3 = System.nanoTime();
        javaAES256.doFinal(tag, 0);
        long n4 = System.nanoTime();
        javaEncryptInputBytes += data.length;

        encryptInitTime = n2 - n1;
        encryptUpdate1Time = n3 - n2;
        encryptDoFinalTime = n4 - n3;
    }
    long javaEncryptEndTime = System.currentTimeMillis();
    System.out.println("Time init (ns): "     + encryptInitTime);
    System.out.println("Time update (ns): "   + encryptUpdate1Time);
    System.out.println("Time do final (ns): " + encryptDoFinalTime);
    System.out.println("Java calculated at " + (javaEncryptInputBytes / 1024 / 1024 / ((javaEncryptEndTime - javaEncryptStartTime) / 1000)) + " MB/s");

    System.out.println("Benchmarking AES-256 GCM decryption for 10 seconds");
    long javaDecryptInputBytes = 0;
    long javaDecryptStartTime = System.currentTimeMillis();
    final GCMParameterSpec gcmParameterSpec = new GCMParameterSpec(16 * Byte.SIZE, iv);
    final SecretKeySpec keySpec = new SecretKeySpec(key, "AES");
    long decryptInitTime = 0L;
    long decryptUpdate1Time = 0L;
    long decryptUpdate2Time = 0L;
    long decryptDoFinalTime = 0L;
    while (System.currentTimeMillis() - javaDecryptStartTime < 10000) {
        long n1 = System.nanoTime();
        javaAES256.init(Cipher.DECRYPT_MODE, keySpec, gcmParameterSpec);
        long n2 = System.nanoTime();
        int offset = javaAES256.update(encrypted, 0, encrypted.length, data, 0);
        long n3 = System.nanoTime();
        javaAES256.update(tag, 0, tag.length, data, offset);
        long n4 = System.nanoTime();
        javaAES256.doFinal(data, offset);
        long n5 = System.nanoTime();
        javaDecryptInputBytes += data.length;

        decryptInitTime += n2 - n1;
        decryptUpdate1Time += n3 - n2;
        decryptUpdate2Time += n4 - n3;
        decryptDoFinalTime += n5 - n4;
    }
    long javaDecryptEndTime = System.currentTimeMillis();
    System.out.println("Time init (ns): " + decryptInitTime);
    System.out.println("Time update 1 (ns): " + decryptUpdate1Time);
    System.out.println("Time update 2 (ns): " + decryptUpdate2Time);
    System.out.println("Time do final (ns): " + decryptDoFinalTime);
    System.out.println("Total bytes processed: " + javaDecryptInputBytes);
    System.out.println("Java calculated at " + (javaDecryptInputBytes / 1024 / 1024 / ((javaDecryptEndTime - javaDecryptStartTime) / 1000)) + " MB/s");
}

EDIT: I leave it as a fun exercise to improve this simple minded benchmark.

编辑:我把它作为一个有趣的练习,以改进这个简单的思想基准。

I've tested some more using the ServerVM, removed nanoTime calls and introduced warmup, but as I expected none of this had any improvement on the benchmark results. It is flat-lined at 3 megabytes per second.

我已经测试了一些使用ServerVM的方法,删除了nanoTime调用,并引入了预热,但是正如我所期望的,这些都没有对基准测试结果有任何改进。它是以每秒3兆字节的速度排列的。

3 个解决方案

#1


18  

Micro-benchmarking aside, the performance of the GCM implementation in JDK 8 (at least up to 1.8.0_25) is crippled.

除了微基准测试之外,JDK 8中GCM实现的性能(至少达到1.8.0_25)是一个缺陷。

I can consistently reproduce the 3MB/s (on a Haswell i7 laptop) with a more mature micro-benchmark.

我可以不断地复制3MB/s(在Haswell i7笔记本上)和一个更成熟的微基准。

From a code dive, this appears to be due to a naive multiplier implementation and no hardware acceleration for the GCM calculations.

从代码的下潜中,这似乎是由于一个简单的乘数实现和GCM计算没有硬件加速。

By comparison AES (in ECB or CBC mode) in JDK 8 uses an AES-NI accelerated intrinsic and is (for Java at least) very quick (in the order of 1GB/s on the same hardware), but the overall AES/GCM performance is completely dominated by the broken GCM performance.

相比之下,在JDK 8中AES(在欧洲央行或CBC模式下)使用AES- ni加速内部,并且(至少对于Java来说)非常快(在相同硬件上的1GB/s的顺序),但是总体AES/GCM性能完全由中断的GCM性能控制。

There are plans to implement hardware acceleration, and there have been third party submissions to improve the performance with, but these haven't made it to a release yet.

有实现硬件加速的计划,并且已经有第三方提交来改进性能,但是这些还没有发布。

Something else to be aware of is that the JDK GCM implementation also buffers the entire plaintext on decryption until the authentication tag at the end of the ciphertext is verified, which cripples it for use with large messages.

还有一点需要注意的是,JDK GCM实现也会缓冲整个明文的解密,直到密码文本末尾的身份验证标记被验证,这将使它与大型消息一起使用。

Bouncy Castle has (at the time of writing) faster GCM implementations (and OCB if you're writing open source software of not encumbered by software patent laws).

Bouncy Castle有(在编写时)更快的GCM实现(如果您正在编写不受软件专利法约束的开源软件),那么GCM实现(以及OCB)。


Updated July 2015 - 1.8.0_45 and JDK 9

更新的2015年7月- 1.8.0_45和JDK 9。

JDK 8+ will get an improved (and constant time) Java implementation (contributed by Florian Weimer of RedHat) - this has landed in JDK 9 EA builds, but apparently not yet in 1.8.0_45. JDK9 (since EA b72 at least) also has GCM intrinsics - AES/GCM speed on b72 is 18MB/s without intrinsics enabled and 25MB/s with intrinsics enabled, both of which are disappointing - for comparison the fastest (not constant time) BC implementation is ~60MB/s and the slowest (constant time, not fully optimised) is ~26MB/s.

JDK 8+将得到一个改进的(和常量时间)Java实现(由RedHat的Florian Weimer贡献)——这在JDK 9 EA构建中得到了实现,但显然还没有在1.8.0_45中实现。JDK9(至少从EA b72)也有GCM intrinsic - AES / GCM速度b72 18 mb / s没有intrinsic启用和25 mb / s启用intrinsic后,这两个是比较令人失望——公元前最快的(不是常数时间)实现~ 60 mb / s,最慢(持续时间,没有完全优化)~ 26 mb / s。


Updated Jan 2016 - 1.8.0_72:

2016年1月更新- 1.8.0_72:

Some performance fixes landed in JDK 1.8.0_60 and performance on the same benchmark now is 18MB/s - a 6x improvement from the original, but still much slower than the BC implementations.

一些性能补丁在JDK 1.8.0_60中得到了实现,在相同基准上的性能现在是18MB/s——比原始版本的性能提高了6x,但是仍然比BC的实现慢得多。

#2


3  

This has now been partially addressed in Java 8u60 with JDK-8069072. Without this fix I get 2.5M/s. With this fix I get 25M/s. Disabling GCM completely gives me 60M/s.

现在已经在Java 8u60中使用JDK-8069072部分解决了这个问题。没有这个修正,我得到2。5m /s。有了这个修正,我得到了25M/s。禁用GCM完全给我60M/s。

To disable GCM completely create a file named java.security with the following line:

要禁用GCM,可以完全创建一个名为java的文件。安全与以下线:

jdk.tls.disabledAlgorithms=SSLv3,GCM

Then start your Java process with:

然后开始Java进程:

java -Djava.security.properties=/path/to/my/java.security ...

If this doesn't work, you may need to enable overriding security properties by editing /usr/java/default/jre/lib/security/java.security (actual path may be different depending on OS) and adding:

如果这不起作用,您可能需要通过编辑/usr/java/default/jre/lib/security/java来启用覆盖安全属性。安全性(实际路径可能与OS不同),并添加:

policy.allowSystemProperty=true

#3


0  

The OpenSSL implementation is optimized by the assembly routine using pclmulqdq instruction(x86 platform). It very fast due to the paralleled algorithm.

使用pclmulqdq指令(x86平台)对OpenSSL实现进行了优化。由于并行算法,速度非常快。

The java implementation is slow. but it was also optimized in Hotspot using assembly routine(not paralleled). you have to warm up the jvm to use Hotspot intrinsic. The default value of -XX:CompileThreshold is 10000.

java实现是缓慢的。但在Hotspot中也使用汇编程序进行了优化(不是并行的)。您必须预热jvm以使用Hotspot固有特性。默认值-XX:CompileThreshold为10000。

// pseudocode

/ /伪代码

warmUp_GCM_cipher_loop10000_times();

warmUp_GCM_cipher_loop10000_times();

do_benchmark();

do_benchmark();

#1


18  

Micro-benchmarking aside, the performance of the GCM implementation in JDK 8 (at least up to 1.8.0_25) is crippled.

除了微基准测试之外,JDK 8中GCM实现的性能(至少达到1.8.0_25)是一个缺陷。

I can consistently reproduce the 3MB/s (on a Haswell i7 laptop) with a more mature micro-benchmark.

我可以不断地复制3MB/s(在Haswell i7笔记本上)和一个更成熟的微基准。

From a code dive, this appears to be due to a naive multiplier implementation and no hardware acceleration for the GCM calculations.

从代码的下潜中,这似乎是由于一个简单的乘数实现和GCM计算没有硬件加速。

By comparison AES (in ECB or CBC mode) in JDK 8 uses an AES-NI accelerated intrinsic and is (for Java at least) very quick (in the order of 1GB/s on the same hardware), but the overall AES/GCM performance is completely dominated by the broken GCM performance.

相比之下,在JDK 8中AES(在欧洲央行或CBC模式下)使用AES- ni加速内部,并且(至少对于Java来说)非常快(在相同硬件上的1GB/s的顺序),但是总体AES/GCM性能完全由中断的GCM性能控制。

There are plans to implement hardware acceleration, and there have been third party submissions to improve the performance with, but these haven't made it to a release yet.

有实现硬件加速的计划,并且已经有第三方提交来改进性能,但是这些还没有发布。

Something else to be aware of is that the JDK GCM implementation also buffers the entire plaintext on decryption until the authentication tag at the end of the ciphertext is verified, which cripples it for use with large messages.

还有一点需要注意的是,JDK GCM实现也会缓冲整个明文的解密,直到密码文本末尾的身份验证标记被验证,这将使它与大型消息一起使用。

Bouncy Castle has (at the time of writing) faster GCM implementations (and OCB if you're writing open source software of not encumbered by software patent laws).

Bouncy Castle有(在编写时)更快的GCM实现(如果您正在编写不受软件专利法约束的开源软件),那么GCM实现(以及OCB)。


Updated July 2015 - 1.8.0_45 and JDK 9

更新的2015年7月- 1.8.0_45和JDK 9。

JDK 8+ will get an improved (and constant time) Java implementation (contributed by Florian Weimer of RedHat) - this has landed in JDK 9 EA builds, but apparently not yet in 1.8.0_45. JDK9 (since EA b72 at least) also has GCM intrinsics - AES/GCM speed on b72 is 18MB/s without intrinsics enabled and 25MB/s with intrinsics enabled, both of which are disappointing - for comparison the fastest (not constant time) BC implementation is ~60MB/s and the slowest (constant time, not fully optimised) is ~26MB/s.

JDK 8+将得到一个改进的(和常量时间)Java实现(由RedHat的Florian Weimer贡献)——这在JDK 9 EA构建中得到了实现,但显然还没有在1.8.0_45中实现。JDK9(至少从EA b72)也有GCM intrinsic - AES / GCM速度b72 18 mb / s没有intrinsic启用和25 mb / s启用intrinsic后,这两个是比较令人失望——公元前最快的(不是常数时间)实现~ 60 mb / s,最慢(持续时间,没有完全优化)~ 26 mb / s。


Updated Jan 2016 - 1.8.0_72:

2016年1月更新- 1.8.0_72:

Some performance fixes landed in JDK 1.8.0_60 and performance on the same benchmark now is 18MB/s - a 6x improvement from the original, but still much slower than the BC implementations.

一些性能补丁在JDK 1.8.0_60中得到了实现,在相同基准上的性能现在是18MB/s——比原始版本的性能提高了6x,但是仍然比BC的实现慢得多。

#2


3  

This has now been partially addressed in Java 8u60 with JDK-8069072. Without this fix I get 2.5M/s. With this fix I get 25M/s. Disabling GCM completely gives me 60M/s.

现在已经在Java 8u60中使用JDK-8069072部分解决了这个问题。没有这个修正,我得到2。5m /s。有了这个修正,我得到了25M/s。禁用GCM完全给我60M/s。

To disable GCM completely create a file named java.security with the following line:

要禁用GCM,可以完全创建一个名为java的文件。安全与以下线:

jdk.tls.disabledAlgorithms=SSLv3,GCM

Then start your Java process with:

然后开始Java进程:

java -Djava.security.properties=/path/to/my/java.security ...

If this doesn't work, you may need to enable overriding security properties by editing /usr/java/default/jre/lib/security/java.security (actual path may be different depending on OS) and adding:

如果这不起作用,您可能需要通过编辑/usr/java/default/jre/lib/security/java来启用覆盖安全属性。安全性(实际路径可能与OS不同),并添加:

policy.allowSystemProperty=true

#3


0  

The OpenSSL implementation is optimized by the assembly routine using pclmulqdq instruction(x86 platform). It very fast due to the paralleled algorithm.

使用pclmulqdq指令(x86平台)对OpenSSL实现进行了优化。由于并行算法,速度非常快。

The java implementation is slow. but it was also optimized in Hotspot using assembly routine(not paralleled). you have to warm up the jvm to use Hotspot intrinsic. The default value of -XX:CompileThreshold is 10000.

java实现是缓慢的。但在Hotspot中也使用汇编程序进行了优化(不是并行的)。您必须预热jvm以使用Hotspot固有特性。默认值-XX:CompileThreshold为10000。

// pseudocode

/ /伪代码

warmUp_GCM_cipher_loop10000_times();

warmUp_GCM_cipher_loop10000_times();

do_benchmark();

do_benchmark();