We have setup a multinode cluster for testing the Spark application with 4 nodes. Each node has 250GB RAM,48 cores. Running master on one node and 3 as slaves.
我们已经设置了一个多节点集群,用于测试具有4个节点的Spark应用程序。每个节点有250GB RAM,48个内核。在一个节点上运行master,在slave上运行3个。
And we have developed a spark application using scala. We use the spark-submit option to run the job. Now here is the point we are struck and need more clarifications to proceed.
我们使用scala开发了一个spark应用程序。我们使用spark-submit选项来运行作业。现在,我们已经达到了这一点,需要进一步澄清才能继续下去。
Query 1: Which is the best option to run a spark job. a) Spark as master b) Yarn as master and the difference.
查询1:哪个是运行spark作业的最佳选择。 a)Spark作为主人b)纱线作为主人和差异。
Query 2: While running any spark job we can provide option like number of executors,no of cores,executor memory etc.
查询2:在运行任何spark任务时,我们可以提供执行器数量,内核数,执行程序内存等选项。
Could you please advise what would be the optimal value for these parameters for better performance in my case.
您能否告诉我这些参数的最佳值,以便在我的情况下获得更好的性能。
Any help would be very much appreciated since it would be helpful for anyone who starts with Spark :)
任何帮助都将非常感谢,因为它对任何以Spark开头的人都会有所帮助:)
Thanks.!!
1 个解决方案
#1
0
Query1: YARN is a better resource manager and supports more features than Spark Master. For more you can visit Apache Spark Cluster Managers
Query1:YARN是一个更好的资源管理器,支持比Spark Master更多的功能。有关更多信息,您可以访问Apache Spark Cluster Manager
Query2: You can only assign resources at the time of job initialization. There are command line flags available. Also, if you don't wish to pass command line flags with spark-submit you can set them when creating spark configuration in the code. You can see the available flags using spark-submit --help
Query2:您只能在作业初始化时分配资源。有命令行标志可用。此外,如果您不希望通过spark-submit传递命令行标志,则可以在代码中创建spark配置时设置它们。您可以使用spark-submit --help查看可用的标志
Fore more information visit Spark Configuration
有关更多信息,请访问Spark配置
Electing resources majorly depends on the size of data you want to process and the problem complexity.
选择资源主要取决于您要处理的数据大小和问题的复杂性。
Please visit 5 mistakes to avoid while writng spark applications
在写入spark应用程序时,请访问5个错误以避免
#1
0
Query1: YARN is a better resource manager and supports more features than Spark Master. For more you can visit Apache Spark Cluster Managers
Query1:YARN是一个更好的资源管理器,支持比Spark Master更多的功能。有关更多信息,您可以访问Apache Spark Cluster Manager
Query2: You can only assign resources at the time of job initialization. There are command line flags available. Also, if you don't wish to pass command line flags with spark-submit you can set them when creating spark configuration in the code. You can see the available flags using spark-submit --help
Query2:您只能在作业初始化时分配资源。有命令行标志可用。此外,如果您不希望通过spark-submit传递命令行标志,则可以在代码中创建spark配置时设置它们。您可以使用spark-submit --help查看可用的标志
Fore more information visit Spark Configuration
有关更多信息,请访问Spark配置
Electing resources majorly depends on the size of data you want to process and the problem complexity.
选择资源主要取决于您要处理的数据大小和问题的复杂性。
Please visit 5 mistakes to avoid while writng spark applications
在写入spark应用程序时,请访问5个错误以避免