I want to change Typesafe config of a Spark job in dev/prod environment. It seems to me that the easiest way to accomplish this is to pass -Dconfig.resource=ENVNAME
to the job. Then Typesafe config library will do the job for me.
我想在dev/prod环境中更改Spark作业的Typesafe配置。在我看来,实现这一点最简单的方法是通过-Dconfig。资源= ENVNAME工作。然后,Typesafe配置库将为我完成这项工作。
Is there way to pass that option directly to the job? Or maybe there is better way to change job config at runtime?
是否有办法将这个选项直接传递给工作?或者也许有更好的方法在运行时更改作业配置?
EDIT:
编辑:
- Nothing happens when I add
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev"
option to spark-submit command. - 当我添加conf“spark.executor.extraJavaOptions=-Dconfig”时,什么都不会发生。资源=dev“选择spark-submit命令。
- I got
Error: Unrecognized option '-Dconfig.resource=dev'.
when I pass-Dconfig.resource=dev
to spark-submit command. - 我有错误:未识别选项'-Dconfig.resource=dev'。当我通过-Dconfig。资源= dev spark-submit命令。
7 个解决方案
#1
35
Change spark-submit
command line adding three options:
更改spark-submit命令行添加三个选项:
--files <location_to_your_app.conf>
- ——文件< location_to_your_app.conf >
--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
- ——设计“spark.executor.extraJavaOptions = -Dconfig.resource =应用”
--conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
- ——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”
#2
13
Here is my spark program run with addition java option
这是我的spark程序和添加java选项。
/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log
as you can see the custom config file --files /home/spark/jobs/fact_stats_ad.conf
您可以看到自定义配置文件——文件/home/spark/jobs/fact_stats_ad.conf。
the executor java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf
executor java选项——conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf。
the driver java options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'
司机java选项。——设计“spark.driver.extraJavaOptions = -Dalluxio.user.file.writetype.default = CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class = alluxio.client.file.policy。MostAvailableFirstPolicy -Dconfig.file = / home / /工作/ fact_stats_ad.conf火花”
Hope it can helps.
希望它能有帮助。
#3
7
I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it: " The right way to pass the parameter is through the property: “spark.driver.extraJavaOptions
” and “spark.executor.extraJavaOptions
”: I’ve passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration). For example (was written in properties file passed in spark-submit with “—properties-file”): “
我有很多问题,通过传递-D参数来激发执行器和驱动程序,我在我的博客文章中添加了一个引用:“传递参数的正确方式是通过属性:”spark.driver。extraJavaOptions”和“spark.executor。extraJavaOptions:我已经通过了log4J配置属性和配置所需的参数。(对于驱动程序,我只能通过log4j配置)。例如(在spark提交的属性文件中写入“- property -file”):
spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties
“
”
You can read my blog post about overall configurations of spark. I'm am running on Yarn as well.
你可以阅读我的博客文章关于spark的整体配置。我也在使用纱线。
#4
6
--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
——文件< location_to_your_app。conf >——conf ' spark.executor.extraJavaOptions = -Dconfig。资源=应用”——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”
if you write in this way, the later --conf
will overwrite the previous one, you can verify this by looking at sparkUI after job started under Environment
tab.
如果您以这种方式编写,稍后——conf将改写前面的内容,您可以通过在Environment选项卡下开始工作后查看sparkUI来验证这一点。
so the correct way is to put the options under same line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d'
if you do this, you can find all your settings will be shown under sparkUI.
所以正确的方法是将选项放在相同的行中,像这样:——conf的spark.executor。如果你这样做,你会发现你的所有设置都将显示在sparkUI下。
#5
2
I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like
我将通过从另一个Scala应用程序中启动的Spark -submit命令启动我的Spark应用程序。我有一个数组。
Array(".../spark-submit", ..., "--conf", confValues, ...)
where confValues
is:
confValues在哪里:
- for
yarn-cluster
mode:"spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
- yarn-cluster模式:“spark.driver.extraJavaOptions = -Drun。模式=生产-Dapp.param =…”
- for
local[*]
mode:"run.mode=development"
- 为当地[*]模式:“run.mode =发展”
It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.
不过,要理解(不是)在哪里(不)转义引号和空格是有点困难的。您可以检查Spark web界面的系统属性值。
#6
0
Use the method like in below command, may be helpful for you -
使用以下命令的方法,可能对您有帮助。
spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod
spark-submit——master local[2]——conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j。属性”,设计“spark.executor.extraJavaOptions = -Dlog4j.configuration =文件:/ tmp / log4j。com.test.spark.application属性”——类。目标/ application-0.0.1-SNAPSHOT-jar-with-dependencies TestSparkJob。jar刺激
I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html
我已经尝试过了,而且它对我很有效,我建议也要通过下面的spark post,这是非常有用的——https://spark.apache.org/docs/latest/running- yarn.html。
#7
0
I originally had this config file:
我原来有这个配置文件:
my-app {
environment: dev
other: xxx
}
This is how I'm loading my config in my spark scala code:
这就是我在spark scala代码中加载配置的方式:
val config = ConfigFactory.parseFile(File<"my-app.conf">)
.withFallback(ConfigFactory.load())
.resolve
.getConfig("my-app")
With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:
有了这个设置,不管Typesafe配置文档和其他所有的答案都说了,当我启动我的spark作业时,系统属性重写并没有起作用:
spark-submit \
--master yarn \
--deploy-mode cluster \
--name my-app \
--driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
--files my-app.conf \
my-app.jar
To get it to work I had to change my config file to:
为了让它生效,我必须将配置文件更改为:
my-app {
environment: dev
environment: ${?env.override}
other: xxx
}
and then launch it like so:
然后像这样发射:
spark-submit \
--master yarn \
--deploy-mode cluster \
--name my-app \
--driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
--files my-app.conf \
my-app.jar
#1
35
Change spark-submit
command line adding three options:
更改spark-submit命令行添加三个选项:
--files <location_to_your_app.conf>
- ——文件< location_to_your_app.conf >
--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
- ——设计“spark.executor.extraJavaOptions = -Dconfig.resource =应用”
--conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
- ——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”
#2
13
Here is my spark program run with addition java option
这是我的spark程序和添加java选项。
/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log
as you can see the custom config file --files /home/spark/jobs/fact_stats_ad.conf
您可以看到自定义配置文件——文件/home/spark/jobs/fact_stats_ad.conf。
the executor java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf
executor java选项——conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf。
the driver java options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'
司机java选项。——设计“spark.driver.extraJavaOptions = -Dalluxio.user.file.writetype.default = CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class = alluxio.client.file.policy。MostAvailableFirstPolicy -Dconfig.file = / home / /工作/ fact_stats_ad.conf火花”
Hope it can helps.
希望它能有帮助。
#3
7
I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it: " The right way to pass the parameter is through the property: “spark.driver.extraJavaOptions
” and “spark.executor.extraJavaOptions
”: I’ve passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration). For example (was written in properties file passed in spark-submit with “—properties-file”): “
我有很多问题,通过传递-D参数来激发执行器和驱动程序,我在我的博客文章中添加了一个引用:“传递参数的正确方式是通过属性:”spark.driver。extraJavaOptions”和“spark.executor。extraJavaOptions:我已经通过了log4J配置属性和配置所需的参数。(对于驱动程序,我只能通过log4j配置)。例如(在spark提交的属性文件中写入“- property -file”):
spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties
“
”
You can read my blog post about overall configurations of spark. I'm am running on Yarn as well.
你可以阅读我的博客文章关于spark的整体配置。我也在使用纱线。
#4
6
--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
——文件< location_to_your_app。conf >——conf ' spark.executor.extraJavaOptions = -Dconfig。资源=应用”——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”
if you write in this way, the later --conf
will overwrite the previous one, you can verify this by looking at sparkUI after job started under Environment
tab.
如果您以这种方式编写,稍后——conf将改写前面的内容,您可以通过在Environment选项卡下开始工作后查看sparkUI来验证这一点。
so the correct way is to put the options under same line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d'
if you do this, you can find all your settings will be shown under sparkUI.
所以正确的方法是将选项放在相同的行中,像这样:——conf的spark.executor。如果你这样做,你会发现你的所有设置都将显示在sparkUI下。
#5
2
I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like
我将通过从另一个Scala应用程序中启动的Spark -submit命令启动我的Spark应用程序。我有一个数组。
Array(".../spark-submit", ..., "--conf", confValues, ...)
where confValues
is:
confValues在哪里:
- for
yarn-cluster
mode:"spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
- yarn-cluster模式:“spark.driver.extraJavaOptions = -Drun。模式=生产-Dapp.param =…”
- for
local[*]
mode:"run.mode=development"
- 为当地[*]模式:“run.mode =发展”
It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.
不过,要理解(不是)在哪里(不)转义引号和空格是有点困难的。您可以检查Spark web界面的系统属性值。
#6
0
Use the method like in below command, may be helpful for you -
使用以下命令的方法,可能对您有帮助。
spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod
spark-submit——master local[2]——conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j。属性”,设计“spark.executor.extraJavaOptions = -Dlog4j.configuration =文件:/ tmp / log4j。com.test.spark.application属性”——类。目标/ application-0.0.1-SNAPSHOT-jar-with-dependencies TestSparkJob。jar刺激
I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html
我已经尝试过了,而且它对我很有效,我建议也要通过下面的spark post,这是非常有用的——https://spark.apache.org/docs/latest/running- yarn.html。
#7
0
I originally had this config file:
我原来有这个配置文件:
my-app {
environment: dev
other: xxx
}
This is how I'm loading my config in my spark scala code:
这就是我在spark scala代码中加载配置的方式:
val config = ConfigFactory.parseFile(File<"my-app.conf">)
.withFallback(ConfigFactory.load())
.resolve
.getConfig("my-app")
With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:
有了这个设置,不管Typesafe配置文档和其他所有的答案都说了,当我启动我的spark作业时,系统属性重写并没有起作用:
spark-submit \
--master yarn \
--deploy-mode cluster \
--name my-app \
--driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
--files my-app.conf \
my-app.jar
To get it to work I had to change my config file to:
为了让它生效,我必须将配置文件更改为:
my-app {
environment: dev
environment: ${?env.override}
other: xxx
}
and then launch it like so:
然后像这样发射:
spark-submit \
--master yarn \
--deploy-mode cluster \
--name my-app \
--driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
--files my-app.conf \
my-app.jar