
时间:2022-05-12 11:21:40

I want to change Typesafe config of a Spark job in dev/prod environment. It seems to me that the easiest way to accomplish this is to pass -Dconfig.resource=ENVNAME to the job. Then Typesafe config library will do the job for me.

我想在dev/prod环境中更改Spark作业的Typesafe配置。在我看来,实现这一点最简单的方法是通过-Dconfig。资源= ENVNAME工作。然后,Typesafe配置库将为我完成这项工作。

Is there way to pass that option directly to the job? Or maybe there is better way to change job config at runtime?




  • Nothing happens when I add --conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev" option to spark-submit command.
  • 当我添加conf“spark.executor.extraJavaOptions=-Dconfig”时,什么都不会发生。资源=dev“选择spark-submit命令。
  • I got Error: Unrecognized option '-Dconfig.resource=dev'. when I pass -Dconfig.resource=dev to spark-submit command.
  • 我有错误:未识别选项'-Dconfig.resource=dev'。当我通过-Dconfig。资源= dev spark-submit命令。

7 个解决方案



Change spark-submit command line adding three options:


  • --files <location_to_your_app.conf>
  • ——文件< location_to_your_app.conf >
  • --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.executor.extraJavaOptions = -Dconfig.resource =应用”
  • --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”



Here is my spark program run with addition java option


/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log

as you can see the custom config file --files /home/spark/jobs/fact_stats_ad.conf


the executor java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf

executor java选项——conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf。

the driver java options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'

司机java选项。——设计“spark.driver.extraJavaOptions = -Dalluxio.user.file.writetype.default = CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class = alluxio.client.file.policy。MostAvailableFirstPolicy -Dconfig.file = / home / /工作/ fact_stats_ad.conf火花”

Hope it can helps.




I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it: " The right way to pass the parameter is through the property: “spark.driver.extraJavaOptions” and “spark.executor.extraJavaOptions”: I’ve passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration). For example (was written in properties file passed in spark-submit with “—properties-file”): “

我有很多问题,通过传递-D参数来激发执行器和驱动程序,我在我的博客文章中添加了一个引用:“传递参数的正确方式是通过属性:”spark.driver。extraJavaOptions”和“spark.executor。extraJavaOptions:我已经通过了log4J配置属性和配置所需的参数。(对于驱动程序,我只能通过log4j配置)。例如(在spark提交的属性文件中写入“- property -file”):

spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties

You can read my blog post about overall configurations of spark. I'm am running on Yarn as well.




--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'

——文件< location_to_your_app。conf >——conf ' spark.executor.extraJavaOptions = -Dconfig。资源=应用”——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”

if you write in this way, the later --conf will overwrite the previous one, you can verify this by looking at sparkUI after job started under Environment tab.


so the correct way is to put the options under same line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d' if you do this, you can find all your settings will be shown under sparkUI.




I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like

我将通过从另一个Scala应用程序中启动的Spark -submit命令启动我的Spark应用程序。我有一个数组。

Array(".../spark-submit", ..., "--conf", confValues, ...)

where confValues is:


  • for yarn-cluster mode:
    "spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
  • yarn-cluster模式:“spark.driver.extraJavaOptions = -Drun。模式=生产-Dapp.param =…”
  • for local[*] mode:
  • 为当地[*]模式:“run.mode =发展”

It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.

不过,要理解(不是)在哪里(不)转义引号和空格是有点困难的。您可以检查Spark web界面的系统属性值。



Use the method like in below command, may be helpful for you -


spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

spark-submit——master local[2]——conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j。属性”,设计“spark.executor.extraJavaOptions = -Dlog4j.configuration =文件:/ tmp / log4j。com.test.spark.application属性”——类。目标/ application-0.0.1-SNAPSHOT-jar-with-dependencies TestSparkJob。jar刺激

I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html

我已经尝试过了,而且它对我很有效,我建议也要通过下面的spark post,这是非常有用的——https://spark.apache.org/docs/latest/running- yarn.html。



I originally had this config file:


my-app {
  environment: dev
  other: xxx

This is how I'm loading my config in my spark scala code:

这就是我在spark scala代码中加载配置的方式:

val config = ConfigFactory.parseFile(File<"my-app.conf">)

With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:


spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
  --files my-app.conf \

To get it to work I had to change my config file to:


my-app {
  environment: dev
  environment: ${?env.override}
  other: xxx

and then launch it like so:


spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
  --files my-app.conf \



Change spark-submit command line adding three options:


  • --files <location_to_your_app.conf>
  • ——文件< location_to_your_app.conf >
  • --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.executor.extraJavaOptions = -Dconfig.resource =应用”
  • --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”



Here is my spark program run with addition java option


/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log

as you can see the custom config file --files /home/spark/jobs/fact_stats_ad.conf


the executor java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf

executor java选项——conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf。

the driver java options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'

司机java选项。——设计“spark.driver.extraJavaOptions = -Dalluxio.user.file.writetype.default = CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class = alluxio.client.file.policy。MostAvailableFirstPolicy -Dconfig.file = / home / /工作/ fact_stats_ad.conf火花”

Hope it can helps.




I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it: " The right way to pass the parameter is through the property: “spark.driver.extraJavaOptions” and “spark.executor.extraJavaOptions”: I’ve passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration). For example (was written in properties file passed in spark-submit with “—properties-file”): “

我有很多问题,通过传递-D参数来激发执行器和驱动程序,我在我的博客文章中添加了一个引用:“传递参数的正确方式是通过属性:”spark.driver。extraJavaOptions”和“spark.executor。extraJavaOptions:我已经通过了log4J配置属性和配置所需的参数。(对于驱动程序,我只能通过log4j配置)。例如(在spark提交的属性文件中写入“- property -file”):

spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties

You can read my blog post about overall configurations of spark. I'm am running on Yarn as well.




--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'

——文件< location_to_your_app。conf >——conf ' spark.executor.extraJavaOptions = -Dconfig。资源=应用”——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”

if you write in this way, the later --conf will overwrite the previous one, you can verify this by looking at sparkUI after job started under Environment tab.


so the correct way is to put the options under same line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d' if you do this, you can find all your settings will be shown under sparkUI.




I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like

我将通过从另一个Scala应用程序中启动的Spark -submit命令启动我的Spark应用程序。我有一个数组。

Array(".../spark-submit", ..., "--conf", confValues, ...)

where confValues is:


  • for yarn-cluster mode:
    "spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
  • yarn-cluster模式:“spark.driver.extraJavaOptions = -Drun。模式=生产-Dapp.param =…”
  • for local[*] mode:
  • 为当地[*]模式:“run.mode =发展”

It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.

不过,要理解(不是)在哪里(不)转义引号和空格是有点困难的。您可以检查Spark web界面的系统属性值。



Use the method like in below command, may be helpful for you -


spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

spark-submit——master local[2]——conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j。属性”,设计“spark.executor.extraJavaOptions = -Dlog4j.configuration =文件:/ tmp / log4j。com.test.spark.application属性”——类。目标/ application-0.0.1-SNAPSHOT-jar-with-dependencies TestSparkJob。jar刺激

I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html

我已经尝试过了,而且它对我很有效,我建议也要通过下面的spark post,这是非常有用的——https://spark.apache.org/docs/latest/running- yarn.html。



I originally had this config file:


my-app {
  environment: dev
  other: xxx

This is how I'm loading my config in my spark scala code:

这就是我在spark scala代码中加载配置的方式:

val config = ConfigFactory.parseFile(File<"my-app.conf">)

With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:


spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
  --files my-app.conf \

To get it to work I had to change my config file to:


my-app {
  environment: dev
  environment: ${?env.override}
  other: xxx

and then launch it like so:


spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
  --files my-app.conf \