如何通过-D参数或环境变量来激发工作?

时间:2022-05-12 11:21:40

I want to change Typesafe config of a Spark job in dev/prod environment. It seems to me that the easiest way to accomplish this is to pass -Dconfig.resource=ENVNAME to the job. Then Typesafe config library will do the job for me.

我想在dev/prod环境中更改Spark作业的Typesafe配置。在我看来,实现这一点最简单的方法是通过-Dconfig。资源= ENVNAME工作。然后,Typesafe配置库将为我完成这项工作。

Is there way to pass that option directly to the job? Or maybe there is better way to change job config at runtime?

是否有办法将这个选项直接传递给工作?或者也许有更好的方法在运行时更改作业配置?

EDIT:

编辑:

  • Nothing happens when I add --conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev" option to spark-submit command.
  • 当我添加conf“spark.executor.extraJavaOptions=-Dconfig”时,什么都不会发生。资源=dev“选择spark-submit命令。
  • I got Error: Unrecognized option '-Dconfig.resource=dev'. when I pass -Dconfig.resource=dev to spark-submit command.
  • 我有错误:未识别选项'-Dconfig.resource=dev'。当我通过-Dconfig。资源= dev spark-submit命令。

7 个解决方案

#1


35  

Change spark-submit command line adding three options:

更改spark-submit命令行添加三个选项:

  • --files <location_to_your_app.conf>
  • ——文件< location_to_your_app.conf >
  • --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.executor.extraJavaOptions = -Dconfig.resource =应用”
  • --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”

#2


13  

Here is my spark program run with addition java option

这是我的spark程序和添加java选项。

/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log

as you can see the custom config file --files /home/spark/jobs/fact_stats_ad.conf

您可以看到自定义配置文件——文件/home/spark/jobs/fact_stats_ad.conf。

the executor java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf

executor java选项——conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf。

the driver java options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'

司机java选项。——设计“spark.driver.extraJavaOptions = -Dalluxio.user.file.writetype.default = CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class = alluxio.client.file.policy。MostAvailableFirstPolicy -Dconfig.file = / home / /工作/ fact_stats_ad.conf火花”

Hope it can helps.

希望它能有帮助。

#3


7  

I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it: " The right way to pass the parameter is through the property: “spark.driver.extraJavaOptions” and “spark.executor.extraJavaOptions”: I’ve passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration). For example (was written in properties file passed in spark-submit with “—properties-file”): “

我有很多问题,通过传递-D参数来激发执行器和驱动程序,我在我的博客文章中添加了一个引用:“传递参数的正确方式是通过属性:”spark.driver。extraJavaOptions”和“spark.executor。extraJavaOptions:我已经通过了log4J配置属性和配置所需的参数。(对于驱动程序,我只能通过log4j配置)。例如(在spark提交的属性文件中写入“- property -file”):

spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties

You can read my blog post about overall configurations of spark. I'm am running on Yarn as well.

你可以阅读我的博客文章关于spark的整体配置。我也在使用纱线。

#4


6  

--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'

——文件< location_to_your_app。conf >——conf ' spark.executor.extraJavaOptions = -Dconfig。资源=应用”——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”

if you write in this way, the later --conf will overwrite the previous one, you can verify this by looking at sparkUI after job started under Environment tab.

如果您以这种方式编写,稍后——conf将改写前面的内容,您可以通过在Environment选项卡下开始工作后查看sparkUI来验证这一点。

so the correct way is to put the options under same line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d' if you do this, you can find all your settings will be shown under sparkUI.

所以正确的方法是将选项放在相同的行中,像这样:——conf的spark.executor。如果你这样做,你会发现你的所有设置都将显示在sparkUI下。

#5


2  

I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like

我将通过从另一个Scala应用程序中启动的Spark -submit命令启动我的Spark应用程序。我有一个数组。

Array(".../spark-submit", ..., "--conf", confValues, ...)

where confValues is:

confValues在哪里:

  • for yarn-cluster mode:
    "spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
  • yarn-cluster模式:“spark.driver.extraJavaOptions = -Drun。模式=生产-Dapp.param =…”
  • for local[*] mode:
    "run.mode=development"
  • 为当地[*]模式:“run.mode =发展”

It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.

不过,要理解(不是)在哪里(不)转义引号和空格是有点困难的。您可以检查Spark web界面的系统属性值。

#6


0  

Use the method like in below command, may be helpful for you -

使用以下命令的方法,可能对您有帮助。

spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

spark-submit——master local[2]——conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j。属性”,设计“spark.executor.extraJavaOptions = -Dlog4j.configuration =文件:/ tmp / log4j。com.test.spark.application属性”——类。目标/ application-0.0.1-SNAPSHOT-jar-with-dependencies TestSparkJob。jar刺激

I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html

我已经尝试过了,而且它对我很有效,我建议也要通过下面的spark post,这是非常有用的——https://spark.apache.org/docs/latest/running- yarn.html。

#7


0  

I originally had this config file:

我原来有这个配置文件:

my-app {
  environment: dev
  other: xxx
}

This is how I'm loading my config in my spark scala code:

这就是我在spark scala代码中加载配置的方式:

val config = ConfigFactory.parseFile(File<"my-app.conf">)
  .withFallback(ConfigFactory.load())
  .resolve
  .getConfig("my-app")

With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:

有了这个设置,不管Typesafe配置文档和其他所有的答案都说了,当我启动我的spark作业时,系统属性重写并没有起作用:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
  --files my-app.conf \
  my-app.jar

To get it to work I had to change my config file to:

为了让它生效,我必须将配置文件更改为:

my-app {
  environment: dev
  environment: ${?env.override}
  other: xxx
}

and then launch it like so:

然后像这样发射:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
  --files my-app.conf \
  my-app.jar

#1


35  

Change spark-submit command line adding three options:

更改spark-submit命令行添加三个选项:

  • --files <location_to_your_app.conf>
  • ——文件< location_to_your_app.conf >
  • --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.executor.extraJavaOptions = -Dconfig.resource =应用”
  • --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
  • ——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”

#2


13  

Here is my spark program run with addition java option

这是我的spark程序和添加java选项。

/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log

as you can see the custom config file --files /home/spark/jobs/fact_stats_ad.conf

您可以看到自定义配置文件——文件/home/spark/jobs/fact_stats_ad.conf。

the executor java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf

executor java选项——conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf。

the driver java options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'

司机java选项。——设计“spark.driver.extraJavaOptions = -Dalluxio.user.file.writetype.default = CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class = alluxio.client.file.policy。MostAvailableFirstPolicy -Dconfig.file = / home / /工作/ fact_stats_ad.conf火花”

Hope it can helps.

希望它能有帮助。

#3


7  

I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it: " The right way to pass the parameter is through the property: “spark.driver.extraJavaOptions” and “spark.executor.extraJavaOptions”: I’ve passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration). For example (was written in properties file passed in spark-submit with “—properties-file”): “

我有很多问题,通过传递-D参数来激发执行器和驱动程序,我在我的博客文章中添加了一个引用:“传递参数的正确方式是通过属性:”spark.driver。extraJavaOptions”和“spark.executor。extraJavaOptions:我已经通过了log4J配置属性和配置所需的参数。(对于驱动程序,我只能通过log4j配置)。例如(在spark提交的属性文件中写入“- property -file”):

spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties

You can read my blog post about overall configurations of spark. I'm am running on Yarn as well.

你可以阅读我的博客文章关于spark的整体配置。我也在使用纱线。

#4


6  

--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'

——文件< location_to_your_app。conf >——conf ' spark.executor.extraJavaOptions = -Dconfig。资源=应用”——设计“spark.driver.extraJavaOptions = -Dconfig.resource =应用”

if you write in this way, the later --conf will overwrite the previous one, you can verify this by looking at sparkUI after job started under Environment tab.

如果您以这种方式编写,稍后——conf将改写前面的内容,您可以通过在Environment选项卡下开始工作后查看sparkUI来验证这一点。

so the correct way is to put the options under same line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d' if you do this, you can find all your settings will be shown under sparkUI.

所以正确的方法是将选项放在相同的行中,像这样:——conf的spark.executor。如果你这样做,你会发现你的所有设置都将显示在sparkUI下。

#5


2  

I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like

我将通过从另一个Scala应用程序中启动的Spark -submit命令启动我的Spark应用程序。我有一个数组。

Array(".../spark-submit", ..., "--conf", confValues, ...)

where confValues is:

confValues在哪里:

  • for yarn-cluster mode:
    "spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
  • yarn-cluster模式:“spark.driver.extraJavaOptions = -Drun。模式=生产-Dapp.param =…”
  • for local[*] mode:
    "run.mode=development"
  • 为当地[*]模式:“run.mode =发展”

It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.

不过,要理解(不是)在哪里(不)转义引号和空格是有点困难的。您可以检查Spark web界面的系统属性值。

#6


0  

Use the method like in below command, may be helpful for you -

使用以下命令的方法,可能对您有帮助。

spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

spark-submit——master local[2]——conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j。属性”,设计“spark.executor.extraJavaOptions = -Dlog4j.configuration =文件:/ tmp / log4j。com.test.spark.application属性”——类。目标/ application-0.0.1-SNAPSHOT-jar-with-dependencies TestSparkJob。jar刺激

I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html

我已经尝试过了,而且它对我很有效,我建议也要通过下面的spark post,这是非常有用的——https://spark.apache.org/docs/latest/running- yarn.html。

#7


0  

I originally had this config file:

我原来有这个配置文件:

my-app {
  environment: dev
  other: xxx
}

This is how I'm loading my config in my spark scala code:

这就是我在spark scala代码中加载配置的方式:

val config = ConfigFactory.parseFile(File<"my-app.conf">)
  .withFallback(ConfigFactory.load())
  .resolve
  .getConfig("my-app")

With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:

有了这个设置,不管Typesafe配置文档和其他所有的答案都说了,当我启动我的spark作业时,系统属性重写并没有起作用:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
  --files my-app.conf \
  my-app.jar

To get it to work I had to change my config file to:

为了让它生效,我必须将配置文件更改为:

my-app {
  environment: dev
  environment: ${?env.override}
  other: xxx
}

and then launch it like so:

然后像这样发射:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --name my-app \
  --driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
  --files my-app.conf \
  my-app.jar