在Infiniband中,RDMA (JSOR)中的Java套接字与j动词的性能。

时间:2021-05-16 22:46:09

I have basic understanding of both JSOR and jVerbs.

我对JSOR和jverb都有基本的了解。

Both handle limitations of JNI and use fast path to reduce latency. Both of them use user Verbs RDMA interface for avoiding context switch and providing fast path access. Both also have options for zero-copy transfer.

它们都处理JNI的限制,并使用快速路径来减少延迟。它们都使用用户谓词RDMA接口来避免上下文切换并提供快速路径访问。两者都有零拷贝传输的选项。

The difference is that JSOR still uses the Java Socket interface. jVerbs provides a new interface. jVerbs also has something called Stateful Verbs Call to avoid repeat serialization of RDMA requests which they say reduces latency. jVerbs provides a more native interface and applications can directly use these. I read the jVerbs SoCC 2013 paper where they build jverbsRPC on top of jVerbs and show that it reduces latency of zookeeper and memcache operations significantly.

不同之处在于JSOR仍然使用Java套接字接口。jverb提供了一个新的接口。jverb还具有一种称为有状态动词的调用,以避免RDMA请求的重复串行化,他们说这样可以减少延迟。jverb提供了更本机的接口,应用程序可以直接使用这些接口。我读了jverb SoCC 2013年的一篇论文,他们在jverb的基础上构建了j动词srpc,并表明它显著降低了zookeeper和memcache操作的延迟。

Documentations for both show that they perform better than regular Java sockets based on TCP/IP, SDP and IPoIB.

这两个文档都显示它们比基于TCP/IP、SDP和IPoIB的普通Java套接字性能更好。

I don't have any performance comparison between JSOR and jVerbs. I think jVerbs may perform better than JSOR. But, with JSOR, I don't have to change my existing code because it still uses the same java socket interface. My question is what may be the performance gain of using jVerbs relative to JSOR. Does anyone know or have experience in dealing with the two? If you have any comparison data that will be great. I could not find any.

我在JSOR和jverb之间没有任何性能比较。我认为jverb可能比JSOR表现更好。但是,使用JSOR,我不必更改现有代码,因为它仍然使用相同的java socket接口。我的问题是使用jverb相对于JSOR的性能收益是多少。有没有人知道或有处理这两个问题的经验?如果你有任何比较数据,那就太棒了。我找不到。

2 个解决方案

#1


2  

Here are some numbers using DiSNI -- the newly open sourced successor of IBM's jVerbs -- and DaRPC, the low-latency RPC library using DiSNI.

下面是一些使用DiSNI (IBM的jverb的新开源继承者)和使用DiSNI的低延迟RPC库DaRPC的数字。

  • DiSNI RDMA read latencies for 64 bytes are below 2 microseconds
  • DiSNI RDMA读取延迟64字节,低于2微秒。
  • DaRPC RDMA send/recv latencies for 64 bytes (request and response) are around 5 microseconds
  • DaRPC RDMA发送/recv延迟64字节(请求和响应)大约是5微秒。
  • The differences betwenn Java/DiSNI and C native RDMA are negligible for one-sided operations
  • 在单边操作中,Java/DiSNI和C本地RDMA的差异可以忽略不计。

These benchmarks have been executed on two hosts connected using a Mellanox ConnectX-3 network interface.

这些基准已经在使用Mellanox ConnectX-3网络接口连接的两台主机上执行。

Here are the commands to execute the benchmarks:

下面是执行基准测试的命令:

(1) Read benchmark

(1)读取基准

Server:

服务器:

java -cp disni-1.0-jar-with-dependencies.jar:disni-1.0-tests.jar com.ibm.disni.examples.benchmarks.AppLauncher -t java-rdma-server -a <address> -o read -s 64 -k 100000 -p

Client:

客户:

java -cp disni-1.0-jar-with-dependencies.jar:disni-1.0-tests.jar com.ibm.disni.examples.benchmarks.AppLauncher -t java-rdma-client -a <address> -o read -s 64 -k 100000 -p

(2) Send/recv benchmark

(2)发送/ recv基准

Server:

服务器:

java -cp darpc-1.0-jar-with-dependencies.jar:darpc-1.0-tests.jar com.ibm.darpc.examples.server.DaRPCServer -a <address> -d -l 64 -r 64 

Client:

客户:

java -cp darpc-1.0-jar-with-dependencies.jar:darpc-1.0-tests.jar com.ibm.darpc.examples.client.DaRPCClient -a <address> -k 1000000 -l 64 -r 64 -b 1

在Infiniband中,RDMA (JSOR)中的Java套接字与j动词的性能。

#2


1  

It is a bit hard to compare performance of jVerbs vs JSOR. The first one is message-oriented API, while the second hides RDMA behind stream-based API of Java sockets.

很难比较jverb和JSOR的性能。第一个是面向消息的API,第二个隐藏在基于流的Java套接字API后面的RDMA。

Here are some stats. My test using a pair of old ConnectX-2 cards and Dell PowerEdge 2970 servers. CentOS 7.1 and Mellanox OFED version 3.1.

这里有一些数据。我的测试使用了一对旧的ConnectX-2卡和Dell PowerEdge 2970服务器。CentOS 7.1和Mellanox版本3.1。

I was only interested in latency test.

我只对延迟测试感兴趣。

jVerbs

jVerbs

Test is a variation of RPing sample (can post on github if anybody is interested). Test measured latency of 5000000 cycles of the following sequence of calls over Reliable connection. Message size was 256 bytes.

测试是RPing样本的一个变体(如果有人感兴趣,可以在github上发布)。通过可靠的连接测试以下调用序列的5000000个周期的延迟。消息大小为256字节。

PostSendMethod.execute()
PollCQMethod.execute()
CompletionChannel.ackCQEvents()

Results (microseconds):

结果(微秒):

  • Median: 10.885
  • 值:10.885
  • 99.0% percentile: 11.663
  • 99.0%百分位:11.663
  • 99.9% percentile: 17.471
  • 99.9%百分位:17.471
  • 99.99% percentile: 27.791
  • 99.99%百分位:27.791

JSOR

JSOR

Similar test over JSOR socket. Test was a text book client/server socket sample. Message size was 256 bytes as well.

JSOR插槽上的类似测试。测试是一个教科书客户端/服务器套接字示例。消息大小也是256字节。

Results (microseconds):

结果(微秒):

  • Median: 43
  • 中位数:43
  • 99.0% percentile: 55
  • 99.0%百分位:55
  • 99.9% percentile: 61
  • 99.9%百分位:61
  • 99.99% percentile: 217
  • 99.99%百分位:217

These results are very far from OFED latency test. On the same hardware/OS standard ib_send_lat benchmark produced 2.77 as median and 23.25 microseconds as maximum latency.

这些结果与ed延迟测试相差甚远。在相同的硬件/操作系统标准ib_send_lat基准上产生了2.77作为中值,23.25微秒作为最大延迟。

#1


2  

Here are some numbers using DiSNI -- the newly open sourced successor of IBM's jVerbs -- and DaRPC, the low-latency RPC library using DiSNI.

下面是一些使用DiSNI (IBM的jverb的新开源继承者)和使用DiSNI的低延迟RPC库DaRPC的数字。

  • DiSNI RDMA read latencies for 64 bytes are below 2 microseconds
  • DiSNI RDMA读取延迟64字节,低于2微秒。
  • DaRPC RDMA send/recv latencies for 64 bytes (request and response) are around 5 microseconds
  • DaRPC RDMA发送/recv延迟64字节(请求和响应)大约是5微秒。
  • The differences betwenn Java/DiSNI and C native RDMA are negligible for one-sided operations
  • 在单边操作中,Java/DiSNI和C本地RDMA的差异可以忽略不计。

These benchmarks have been executed on two hosts connected using a Mellanox ConnectX-3 network interface.

这些基准已经在使用Mellanox ConnectX-3网络接口连接的两台主机上执行。

Here are the commands to execute the benchmarks:

下面是执行基准测试的命令:

(1) Read benchmark

(1)读取基准

Server:

服务器:

java -cp disni-1.0-jar-with-dependencies.jar:disni-1.0-tests.jar com.ibm.disni.examples.benchmarks.AppLauncher -t java-rdma-server -a <address> -o read -s 64 -k 100000 -p

Client:

客户:

java -cp disni-1.0-jar-with-dependencies.jar:disni-1.0-tests.jar com.ibm.disni.examples.benchmarks.AppLauncher -t java-rdma-client -a <address> -o read -s 64 -k 100000 -p

(2) Send/recv benchmark

(2)发送/ recv基准

Server:

服务器:

java -cp darpc-1.0-jar-with-dependencies.jar:darpc-1.0-tests.jar com.ibm.darpc.examples.server.DaRPCServer -a <address> -d -l 64 -r 64 

Client:

客户:

java -cp darpc-1.0-jar-with-dependencies.jar:darpc-1.0-tests.jar com.ibm.darpc.examples.client.DaRPCClient -a <address> -k 1000000 -l 64 -r 64 -b 1

在Infiniband中,RDMA (JSOR)中的Java套接字与j动词的性能。

#2


1  

It is a bit hard to compare performance of jVerbs vs JSOR. The first one is message-oriented API, while the second hides RDMA behind stream-based API of Java sockets.

很难比较jverb和JSOR的性能。第一个是面向消息的API,第二个隐藏在基于流的Java套接字API后面的RDMA。

Here are some stats. My test using a pair of old ConnectX-2 cards and Dell PowerEdge 2970 servers. CentOS 7.1 and Mellanox OFED version 3.1.

这里有一些数据。我的测试使用了一对旧的ConnectX-2卡和Dell PowerEdge 2970服务器。CentOS 7.1和Mellanox版本3.1。

I was only interested in latency test.

我只对延迟测试感兴趣。

jVerbs

jVerbs

Test is a variation of RPing sample (can post on github if anybody is interested). Test measured latency of 5000000 cycles of the following sequence of calls over Reliable connection. Message size was 256 bytes.

测试是RPing样本的一个变体(如果有人感兴趣,可以在github上发布)。通过可靠的连接测试以下调用序列的5000000个周期的延迟。消息大小为256字节。

PostSendMethod.execute()
PollCQMethod.execute()
CompletionChannel.ackCQEvents()

Results (microseconds):

结果(微秒):

  • Median: 10.885
  • 值:10.885
  • 99.0% percentile: 11.663
  • 99.0%百分位:11.663
  • 99.9% percentile: 17.471
  • 99.9%百分位:17.471
  • 99.99% percentile: 27.791
  • 99.99%百分位:27.791

JSOR

JSOR

Similar test over JSOR socket. Test was a text book client/server socket sample. Message size was 256 bytes as well.

JSOR插槽上的类似测试。测试是一个教科书客户端/服务器套接字示例。消息大小也是256字节。

Results (microseconds):

结果(微秒):

  • Median: 43
  • 中位数:43
  • 99.0% percentile: 55
  • 99.0%百分位:55
  • 99.9% percentile: 61
  • 99.9%百分位:61
  • 99.99% percentile: 217
  • 99.99%百分位:217

These results are very far from OFED latency test. On the same hardware/OS standard ib_send_lat benchmark produced 2.77 as median and 23.25 microseconds as maximum latency.

这些结果与ed延迟测试相差甚远。在相同的硬件/操作系统标准ib_send_lat基准上产生了2.77作为中值,23.25微秒作为最大延迟。