Spark流使用较少数量的执行程序

时间:2022-12-15 20:52:00

I am using spark streaming process some events. It is deployed in standalone mode with 1 master and 3 workers. I have set number of cores per executor to 4 and total num of executors to 24. This means totally 6 executors will be spawned. I have set spread-out to true. So each worker machine get 2 executors. My batch interval is 1 second. Also I have repartitioned the batch to 21. The rest 3 are for receivers. While running what I observe from event timeline is that only 3 of the executors are being used. The other 3 are not being used. As far as I know, there is no parameter in spark standalone mode to specify the number of executors. How do I make spark to use all the available executors?

我正在使用火花流媒体处理一些事件。它以独立模式部署,包含1个主服务器和3个工作服务器。我已将每个执行程序的核心数设置为4,将执行程序的总数设置为24.这意味着将生成总共6个执行程序。我已将展开设定为真。所以每个工人机器都有2个执行器。我的批处理间隔是1秒。此外,我已将批次重新分配到21.其余3个用于接收器。在运行我从事件时间线观察到的内容时,只使用了3个执行程序。其他3个没有被使用。据我所知,spark独立模式中没有参数来指定执行程序的数量。如何使用所有可用的执行程序来激发火花?

1 个解决方案

#1


0  

Probably your streaming has not so many partitions to fill all executors on every 1-second minibatch. Try repartition(24) as first streaming transformation to use full spark cluster power.

可能你的流媒体没有那么多的分区来填充每1秒小批量的所有执行者。尝试重新分区(24)作为第一次流转换以使用完整的火花簇功率。

#1


0  

Probably your streaming has not so many partitions to fill all executors on every 1-second minibatch. Try repartition(24) as first streaming transformation to use full spark cluster power.

可能你的流媒体没有那么多的分区来填充每1秒小批量的所有执行者。尝试重新分区(24)作为第一次流转换以使用完整的火花簇功率。